Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation

Bingbing Xu; Jiani Luo; Shuyi Miao; Stefan Konigorski; Tat-Seng Chua; Xiaoyan Zhao; Yang Zhang

arxiv: 2605.24647 · v1 · pith:W5MERQBVnew · submitted 2026-05-23 · 💻 cs.CL

Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation

Jiani Luo , Xiaoyan Zhao , Yang Zhang , Shuyi Miao , Bingbing Xu , Stefan Konigorski , Tat-Seng Chua This is my paper

Pith reviewed 2026-06-30 13:23 UTC · model grok-4.3

classification 💻 cs.CL

keywords personalized dialogueuser state modelingfree energy principlemulti-turn conversationlatent statesaction selectionLLM personalizationpartial observability

0 comments

The pith

Explicit modeling of latent user states and their action-conditioned dynamics enables better long-term action selection in personalized dialogue.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that effective personalization requires inferring hidden user states that evolve over turns and choosing responses based on how those states are likely to change. It introduces PUMA, a framework that maintains a belief distribution over these states, updates a model of how observations arise and how actions drive transitions, and picks the next action by minimizing expected free energy. This replaces passive reuse of past facts with active planning over future user evolution. Experiments on annotated counseling benchmarks report gains in long-horizon outcomes alongside preserved response quality, and cross-dataset tests show improved state estimation accuracy. Readers would care because most current systems still treat user information as static memory rather than a dynamic process that responses can steer.

Core claim

PUMA formulates personalization as decision-making under partial observability centered on an explicit user state model that captures latent user states and their action-conditioned dynamics. At each turn PUMA maintains a belief over the hidden state, refines the model for observation generation and action-conditioned state transition, and selects dialogue actions by minimizing expected free energy, balancing epistemic and pragmatic objectives under a unified criterion. This shifts personalization from passive memory retrieval to model-based decision-making over user evolution, with experiments showing improved long-horizon outcomes on healthcare-oriented counseling benchmarks and more relia

What carries the argument

PUMA, the prospective user-state modeling framework that maintains a belief over hidden states and chooses actions by minimizing expected free energy under the Free Energy Principle.

If this is right

Long-horizon dialogue outcomes improve on healthcare counseling and motivational interviewing benchmarks while response quality stays high.
User-state estimation and next-state prediction become more reliable when evaluated across multiple datasets.
Action selection can be driven by predicted effects on future user states rather than by retrieval of past observations alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same belief-update and free-energy minimization loop could be applied to tutoring or customer-service dialogues where user knowledge or intent also evolves.
If the learned transition model proves stable, it could be reused across sessions to initialize beliefs without starting from scratch each time.
Combining the explicit state model with larger language models might allow the system to generate both the belief update and the chosen response in a single forward pass.

Load-bearing premise

An explicit latent user-state model with action-conditioned transition dynamics can be learned from data and used to minimize expected free energy in a way that produces measurably better long-horizon outcomes than memory- or profile-based baselines.

What would settle it

A controlled replication on the same counseling benchmarks in which PUMA shows no statistically significant gain on long-horizon outcome metrics or in which its state-estimation accuracy falls to the level of the memory-based baselines.

Figures

Figures reproduced from arXiv: 2605.24647 by Bingbing Xu, Jiani Luo, Shuyi Miao, Stefan Konigorski, Tat-Seng Chua, Xiaoyan Zhao, Yang Zhang.

**Figure 2.** Figure 2: Temporal analysis of state prediction and readiness progression. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: A simulated MI-coaching session between a counselor and a patient [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

read the original abstract

Personalized dialogue requires more than recalling explicit user histories: systems also need to infer hidden user states that evolve through interaction and shape appropriate response strategies. Existing memory- and profile-based methods primarily reuse observable user information, offering limited support for modeling user-state dynamics or selecting actions based on how they shape future user states. We propose PUMA (Prospective User-state Modeling for Action selection), a framework grounded in the Free Energy Principle (FEP) that formulates personalization as decision-making under partial observability, centered on an explicit user state model that captures latent user states and their action-conditioned dynamics. At each turn, PUMA maintains a belief over the user's hidden state, refines the user state model for observation generation and action-conditioned state transition, and selects dialogue actions by minimizing expected free energy, balancing epistemic and pragmatic objectives under a unified criterion. This formulation shifts personalization from passive memory retrieval to model-based decision-making over user evolution. We instantiate PUMA on healthcare-oriented counseling and motivational interviewing benchmarks with latent state annotations for rigorous evaluation. Experiments show that PUMA improves long-horizon dialogue outcomes while maintaining strong response quality, and a cross-dataset study demonstrates more reliable user-state estimation and next-state prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PUMA applies the Free Energy Principle to maintain beliefs over latent user states and select dialogue actions by minimizing expected free energy, but the reported gains on long-horizon outcomes rest on evidence not visible in the abstract.

read the letter

The main thing to know is that this paper treats personalization as active inference: it keeps a belief over a hidden user state, learns action-conditioned transitions, and picks responses to minimize expected free energy.

What is new is the explicit combination of FEP machinery with dialogue action selection. They define a user-state model that generates observations and predicts next states, then use that model for decision-making rather than simple memory lookup. The instantiation on healthcare counseling and motivational interviewing benchmarks with latent-state annotations lets them evaluate both response quality and state estimation directly.

The formulation is clean. It unifies epistemic value (reducing uncertainty about the user) and pragmatic value (producing good replies) under one objective, and the cross-dataset check on state prediction adds a useful test.

The soft spot is the results. The abstract says PUMA improves long-horizon outcomes and state estimation, yet supplies no numbers, baselines, or statistical details. Without those, it is impossible to judge whether the gains are real or whether the model simply fits the provided annotations. The full paper presumably contains the tables; that section will decide how much weight the claims carry.

This is for people working on long-horizon dialogue agents in applied settings such as healthcare. Readers already comfortable with POMDPs or active inference will see the mapping quickly.

It deserves a serious referee to examine the experiments and check whether the empirical improvements hold up against standard memory and profile baselines.

Referee Report

2 major / 2 minor

Summary. The paper proposes PUMA (Prospective User-state Modeling for Action selection), a framework grounded in the Free Energy Principle that formulates LLM personalization in multi-turn dialogue as decision-making under partial observability. It maintains a belief over latent user states, refines an explicit user-state model with action-conditioned transitions, and selects actions by minimizing expected free energy to balance epistemic and pragmatic objectives. The approach is evaluated on healthcare-oriented counseling and motivational interviewing benchmarks that include latent state annotations; the central claims are improved long-horizon dialogue outcomes relative to memory- and profile-based baselines together with more reliable user-state estimation and next-state prediction in a cross-dataset study.

Significance. If the empirical results hold with appropriate controls, the work would provide a principled, model-based alternative to passive memory retrieval for personalization, potentially improving coherence over extended interactions. The explicit use of expected free energy offers a unified criterion that could be adopted in other interactive settings; the provision of latent-state annotations on the benchmarks is a positive contribution for reproducible evaluation of state-tracking methods.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the claims of improved long-horizon outcomes and more reliable state estimation are stated without any reported metrics, baselines, effect sizes, or statistical tests, so the central empirical assertion cannot be assessed from the supplied evidence.
[§3] §3 (PUMA formulation): the action-selection step minimizes expected free energy with respect to the learned user-state model; it is unclear whether the reported gains remain after controlling for the quality of the fitted transition dynamics or whether they are partly circular with the model itself.

minor comments (2)

Define all acronyms at first use (FEP, PUMA, etc.) and ensure consistent notation for belief states and expected free energy across sections.
The cross-dataset study would benefit from an explicit description of how the latent-state annotations were obtained and how inter-annotator agreement was measured.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to improve the clarity and rigor of the empirical claims and formulation.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the claims of improved long-horizon outcomes and more reliable state estimation are stated without any reported metrics, baselines, effect sizes, or statistical tests, so the central empirical assertion cannot be assessed from the supplied evidence.

Authors: We agree that the abstract and Experiments section would benefit from explicit quantitative reporting. The current manuscript summarizes outcomes at a high level. We will revise the abstract to include key metrics (e.g., dialogue success rates, state estimation accuracy) with comparisons to baselines, and expand the Experiments section to report effect sizes and results of statistical tests (paired t-tests or appropriate non-parametric equivalents with p-values) for all central claims. revision: yes
Referee: [§3] §3 (PUMA formulation): the action-selection step minimizes expected free energy with respect to the learned user-state model; it is unclear whether the reported gains remain after controlling for the quality of the fitted transition dynamics or whether they are partly circular with the model itself.

Authors: We appreciate this clarification request. The user-state model is fit from data, and action selection proceeds by minimizing expected free energy under that model; the reported improvements are shown relative to memory- and profile-based baselines that lack this prospective modeling. To address potential circularity, we will add an ablation that holds the learned transition dynamics fixed and varies only the action-selection criterion (comparing expected free energy minimization against simpler selection strategies). This will isolate the contribution of the decision-making step. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The provided abstract and description present PUMA as an application of the externally established Free Energy Principle to dialogue personalization via standard POMDP-style belief maintenance, action-conditioned transitions, and expected free energy minimization. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are shown that reduce the central formulation to its own inputs by construction. The empirical claims (long-horizon gains over baselines, reliable state estimation) are framed as testable outcomes rather than definitional or self-referential. This is a normal non-circular case of importing an established framework and evaluating it empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are enumerated beyond the high-level claim that the method is grounded in the Free Energy Principle.

axioms (1)

domain assumption Free Energy Principle provides a suitable normative criterion for action selection under partial observability in dialogue
The framework is explicitly grounded in FEP for formulating personalization as decision-making.

invented entities (1)

PUMA user-state model no independent evidence
purpose: Captures latent user states and their action-conditioned dynamics for belief maintenance and action selection
New explicit model introduced as the center of the framework

pith-pipeline@v0.9.1-grok · 5765 in / 1282 out tokens · 35801 ms · 2026-06-30T13:23:26.497531+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 16 canonical work pages · 4 internal anchors

[1]

Large language models empowered personalized web agents

Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, and Tat-Seng Chua. Large language models empowered personalized web agents. InProceedings of the ACM on Web Conference 2025, pages 198–215, 2025

2025
[2]

A survey on dialogue systems: Recent advances and new frontiers.Acm Sigkdd Explorations Newsletter, 19(2):25–35, 2017

Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. A survey on dialogue systems: Recent advances and new frontiers.Acm Sigkdd Explorations Newsletter, 19(2):25–35, 2017

2017
[3]

A personalized dialogue generator with implicit user persona detection.arXiv preprint arXiv:2204.07372, 2022

Itsugun Cho, Dongyang Wang, Ryota Takahashi, and Hiroaki Saito. A personalized dialogue generator with implicit user persona detection.arXiv preprint arXiv:2204.07372, 2022

work page arXiv 2022
[4]

Large language models (llms) inference offloading and resource allocation in cloud-edge networks: An active inference approach

Jingcheng Fang, Ying He, F Richard Yu, Jianqiang Li, and Victor C Leung. Large language models (llms) inference offloading and resource allocation in cloud-edge networks: An active inference approach. In2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), pages 1–5. IEEE, 2023

2023
[5]

The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

Karl Friston. The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

2010
[6]

A free energy principle for the brain.Journal of Physiology-Paris, 100(1–3):70–87, 2006

Karl Friston, James Kilner, and Lee Harrison. A free energy principle for the brain.Journal of Physiology-Paris, 100(1–3):70–87, 2006. doi: 10.1016/j.jphysparis.2006.10.001

work page doi:10.1016/j.jphysparis.2006.10.001 2006
[7]

Active inference and epistemic value.Cognitive neuroscience, 6(4):187–214, 2015

Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys, Thomas Fitzgerald, and Giovanni Pezzulo. Active inference and epistemic value.Cognitive neuroscience, 6(4):187–214, 2015

2015
[8]

Active inference and learning.Neuroscience & Biobehavioral Reviews, 68:862–879, 2016

Karl Friston, Thomas FitzGerald, Francesco Rigoli, Philipp Schwartenbeck, Giovanni Pezzulo, et al. Active inference and learning.Neuroscience & Biobehavioral Reviews, 68:862–879, 2016

2016
[9]

Active inference: A process theory.Neural Computation, 29(1):1–49, 2017

Karl Friston, Thomas FitzGerald, Francesco Rigoli, Philipp Schwartenbeck, and Giovanni Pezzulo. Active inference: A process theory.Neural Computation, 29(1):1–49, 2017. doi: 10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017
[10]

Action and behavior: a free-energy formulation.Biological cybernetics, 102(3):227–260, 2010

Karl J Friston, Jean Daunizeau, James Kilner, and Stefan J Kiebel. Action and behavior: a free-energy formulation.Biological cybernetics, 102(3):227–260, 2010

2010
[11]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Ying He, Jingcheng Fang, F Richard Yu, and Victor C Leung. Large language models (llms) inference offloading and resource allocation in cloud-edge computing: An active inference approach.IEEE Transactions on Mobile Computing, 23(12):11253–11264, 2024

2024
[13]

Mem-pal: Towards memory-based personalized dialogue assistants for long- term user-agent interaction

Zhaopei Huang, Qifeng Dai, Guozheng Wu, Xiaopeng Wu, Xubin Li, Tiezheng Ge, Wenxuan Wang, and Qin Jin. Mem-pal: Towards memory-based personalized dialogue assistants for long- term user-agent interaction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 31229–31237, 2026

2026
[14]

Dialogue management by estimating user’s internal state using movie recommendation dialogue.Journal of Natural Language Processing, 28(1):104–135, 2021

Takashi Kodama, Ribeka Tanaka, and Sadao Kurohashi. Dialogue management by estimating user’s internal state using movie recommendation dialogue.Journal of Natural Language Processing, 28(1):104–135, 2021. 10

2021
[15]

Aligning to thousands of preferences via system message generalization.Advances in Neural Information Processing Systems, 37:73783–73829, 2024

Seongyun Lee, Sue Hyun Park, Seungone Kim, and Minjoon Seo. Aligning to thousands of preferences via system message generalization.Advances in Neural Information Processing Systems, 37:73783–73829, 2024

2024
[16]

Hello again! llm-powered personalized agent for long-term dialogue

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, and Tat-Seng Chua. Hello again! llm-powered personalized agent for long-term dialogue. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5259–5276, 2025

2025
[17]

A persona-aware llm-enhanced framework for multi-session personalized dialogue generation

Dongshuo Liu, Zhijing Wu, Dandan Song, and He-Yan Huang. A persona-aware llm-enhanced framework for multi-session personalized dialogue generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 103–123, 2025

2025
[18]

L-mtp: Leap multi-token prediction beyond adjacent context for large language models.Advances in Neural Information Processing Systems, 38: 102569–102600, 2026

Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, and Tat-Seng Chua. L-mtp: Leap multi-token prediction beyond adjacent context for large language models.Advances in Neural Information Processing Systems, 38: 102569–102600, 2026

2026
[19]

Miracle: Towards personalized dialogue generation with latent-space multiple personal attribute control

Zhenyi Lu, Wei Wei, Xiaoye Qu, XianLing Mao, Dangyang Chen, and Jixiong Chen. Miracle: Towards personalized dialogue generation with latent-space multiple personal attribute control. arXiv preprint arXiv:2310.18342, 2023

work page arXiv 2023
[20]

Odar: Principled adaptive routing for llm reasoning via active inference.arXiv preprint arXiv:2602.23681, 2026

Siyuan Ma, Bo Gao, Xiaojun Jia, Simeng Qin, Tianlin Li, Ke Ma, Xiaoshuang Jia, Wenqi Ren, and Yang Liu. Odar: Principled adaptive routing for llm reasoning via active inference.arXiv preprint arXiv:2602.23681, 2026

work page arXiv 2026
[21]

Maple: A framework for active preference learning guided by large language models

Saaduddin Mahmud, Mason Nakamura, and Shlomo Zilberstein. Maple: A framework for active preference learning guided by large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27518–27528, 2025

2025
[22]

UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

Shuyi Miao, Wangjie Qiu, Shengda Zhuo, Fei Shen, Dan Lin, Xingtong Yu, Chua Tat-Seng, and Zhiming Zheng. Unidetect: Llm-driven universal fraud detection across heterogeneous blockchains.arXiv preprint arXiv:2604.12329, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Active inference and human– computer interaction.ACM Transactions on Computer-Human Interaction, 32(6):1–45, 2025

Roderick Murray-Smith, John H Williamson, and Sebastian Stein. Active inference and human– computer interaction.ACM Transactions on Computer-Human Interaction, 32(6):1–45, 2025

2025
[24]

Generating meaning: active inference and the scope and limits of passive ai.Trends in Cognitive Sciences, 28(2): 97–112, 2024

Giovanni Pezzulo, Thomas Parr, Paul Cisek, Andy Clark, and Karl Friston. Generating meaning: active inference and the scope and limits of passive ai.Trends in Cognitive Sciences, 28(2): 97–112, 2024

2024
[25]

Active preference inference using language models and probabilistic reasoning

Top Piriyakulkij, V olodymyr Kuleshov, and Kevin Ellis. Active preference inference using language models and probabilistic reasoning. InNeurIPS 2023 Foundation Models for Decision Making Workshop, 2025

2023
[26]

Initial efficacy of mi, ttm tailoring and hri’s with multiple behaviors for employee health promotion.Preventive medicine, 46(3):226–231, 2008

James O Prochaska, Susan Butterworth, Colleen A Redding, Verna Burden, Nancy Perrin, Michael Leo, Marna Flaherty-Robb, and Janice M Prochaska. Initial efficacy of mi, ttm tailoring and hri’s with multiple behaviors for employee health promotion.Preventive medicine, 46(3):226–231, 2008

2008
[27]

Latent inter-user difference modeling for llm personalization

Yilun Qiu, Tianhao Shi, Xiaoyan Zhao, Fengbin Zhu, Yang Zhang, and Fuli Feng. Latent inter-user difference modeling for llm personalization. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10610–10628, 2025

2025
[28]

Measuring what makes you unique: Difference-aware user modeling for enhancing llm personalization

Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, and Tat-Seng Chua. Measuring what makes you unique: Difference-aware user modeling for enhancing llm personalization. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21258–21277, 2025

2025
[29]

Raptor: Recursive abstractive processing for tree-organized retrieval

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. InThe Twelfth International Conference on Learning Representations, 2024. 11

2024
[30]

Average user-side counterfactual fairness for collaborative filtering.ACM Transactions on Information Systems, 42(5):1–26, 2024

Pengyang Shao, Le Wu, Kun Zhang, Defu Lian, Richang Hong, Yong Li, and Meng Wang. Average user-side counterfactual fairness for collaborative filtering.ACM Transactions on Information Systems, 42(5):1–26, 2024

2024
[31]

Pmg: Personalized multimodal response generation with large language models

Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, and Xi Xiao. Pmg: Personalized multimodal response generation with large language models. InThe Web Conference 2024

2024
[32]

Answering narrative-driven recommendation queries via a retrieve–rank paradigm and the ocg-agent

Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, and Min Xu. Answering narrative-driven recommendation queries via a retrieve–rank paradigm and the ocg-agent. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13192–13213, 2025

2025
[33]

An active inference strategy for prompting reliable responses from large language models in medical practice.npj Digital Medicine, 8(1):119, 2025

Roma Shusterman, Allison C Waters, Shannon O’Neill, Marshall Bangs, Phan Luu, and Don M Tucker. An active inference strategy for prompting reliable responses from large language models in medical practice.npj Digital Medicine, 8(1):119, 2025

2025
[34]

Chain-of-strategy planning with llms: Aligning the gener- ation of psychotherapy dialogue with strategy in motivational interviewing.arXiv preprint arXiv:2408.06527, 2024

Xin Sun, Xiao Tang, Abdallah El Ali, Zhuying Li, Xiaoyu Shen, Pengjie Ren, Jan de Wit, Jiahuan Pei, and Jos A Bosch. Chain-of-strategy planning with llms: Aligning the gener- ation of psychotherapy dialogue with strategy in motivational interviewing.arXiv preprint arXiv:2408.06527, 2024

work page arXiv 2024
[35]

Enhancing personalized dialogue generation with contrastive latent variables.arXiv preprint arXiv:2305.11482, 2023

Yihong Tang, Bo Wang, Miao Fang, Dongming Zhao, Kun Huang, Ruifang He, and Yuexian Hou. Enhancing personalized dialogue generation with contrastive latent variables.arXiv preprint arXiv:2305.11482, 2023

work page arXiv 2023
[36]

Msl: Not all tokens are what you need for tuning llm as a recommender

Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, and Can Wang. Msl: Not all tokens are what you need for tuning llm as a recommender. InProceedings of the 48th international ACM SIGIR conference on research and development in information retrieval, pages 1912–1922, 2025

1912
[37]

Think-while-generating: On-the-fly reasoning for personalized long-form generation.arXiv preprint arXiv:2512.06690, 2025

Chengbing Wang, Yang Zhang, Wenjie Wang, Xiaoyan Zhao, Fuli Feng, Xiangnan He, and Tat-Seng Chua. Think-while-generating: On-the-fly reasoning for personalized long-form generation.arXiv preprint arXiv:2512.06690, 2025

work page arXiv 2025
[38]

Rlpf: Reinforcement learning from prediction feedback for user summarization with llms

Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang, Sushant Prakash, Shawn O’Banion, Bradley Green, and Jun Xie. Rlpf: Reinforcement learning from prediction feedback for user summarization with llms. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 25488–25496, 2025

2025
[39]

Humanlm: Simulating users with state alignment beats response imitation.arXiv preprint arXiv:2603.03303, 2026

Shirley Wu, Evelyn Choi, Arpandeep Khatua, Zhanghan Wang, Joy He-Yueya, Tharindu Cyril Weerasooriya, Wei Wei, Diyi Yang, Jure Leskovec, and James Zou. Humanlm: Simulating users with state alignment beats response imitation.arXiv preprint arXiv:2603.03303, 2026

work page arXiv 2026
[40]

Anno-mi: A dataset of expert-annotated counselling dialogues

Zixiu Wu, Simone Balloccu, Vivek Kumar, Rim Helaoui, Ehud Reiter, Diego Reforgiato Recu- pero, and Daniele Riboni. Anno-mi: A dataset of expert-annotated counselling dialogues. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6177–6181. IEEE, 2022

2022
[41]

Few-shot dialogue strategy learning for motiva- tional interviewing via inductive reasoning

Zhouhang Xie, Bodhisattwa Prasad Majumder, Mengjie Zhao, Yoshinori Maeda, Keiichi Yamada, Hiromi Wakaki, and Julian McAuley. Few-shot dialogue strategy learning for motiva- tional interviewing via inductive reasoning. InFindings of the Association for Computational Linguistics: ACL 2024, pages 13207–13219, 2024

2024
[42]

Gajos, and Dakuo Zhang

Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Krzysztof Z. Gajos, and Dakuo Zhang. Crafting personalized agents through retrieval-augmented generation on editable memory graphs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2024
[43]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Reliable and responsible foundation models.Transactions on Machine Learning Research, 2025

Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, et al. Reliable and responsible foundation models.Transactions on Machine Learning Research, 2025

2025
[45]

Cami: A counselor agent sup- porting motivational interviewing through state inference and topic exploration

Yizhe Yang, Palakorn Achananuparp, He-Yan Huang, Jing Jiang, Phey Ling Kit, Nicholas Gabriel Lim, Cameron Tan Shi Ern, and Ee-Peng Lim. Cami: A counselor agent sup- porting motivational interviewing through state inference and topic exploration. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Paper...

2025
[46]

Consistent client simulation for motivational interviewing-based counseling

Yizhe Yang, Palakorn Achananuparp, He-Yan Huang, Jing Jiang, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, and Ee-peng Lim. Consistent client simulation for motivational interviewing-based counseling. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

2025
[47]

Memweaver: A hierarchical memory from textual interactive behaviors for personalized genera- tion

Shuo Yu, Mingyue Cheng, Daoyu Wang, Qi Liu, Zirui Liu, Ze Guo, and Xiaoyu Tao. Memweaver: A hierarchical memory from textual interactive behaviors for personalized genera- tion. InProceedings of the ACM Web Conference 2026, pages 6920–6931, 2026

2026
[48]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[49]

Prime: Large language model personalization with cognitive dual-memory and personalized thought process

Xinliang Frederick Zhang, Nick Beauchamp, and Lu Wang. Prime: Large language model personalization with cognitive dual-memory and personalized thought process. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33695– 33724, 2025

2025
[50]

Reinforced latent reasoning for llm-based recommendation.ICLR, 2026

Yang Zhang, Wenxin Xu, Xiaoyan Zhao, Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. Reinforced latent reasoning for llm-based recommendation.ICLR, 2026

2026
[51]

Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, and Tat-Seng Chua. Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

work page arXiv 2026
[52]

Explicit vs implicit memory: Exploring multi-hop complex reasoning over personalized information

Zeyu Zhang, Yang Zhang, Haoran Tan, Rui Li, and Xu Chen. Explicit vs implicit memory: Exploring multi-hop complex reasoning over personalized information. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 1964–1974, 2026

1964
[53]

Exploring the impact of personality traits on conversational recommender systems: A simulation with large language models.arXiv preprint arXiv:2504.12313, 2025

Xiaoyan Zhao, Yang Deng, Wenjie Wang, Hong Cheng, Rui Zhang, See-Kiong Ng, Tat-Seng Chua, et al. Exploring the impact of personality traits on conversational recommender systems: A simulation with large language models.arXiv preprint arXiv:2504.12313, 2025

work page arXiv 2025
[54]

Steerx: Disentangled steering for llm personalization.arXiv preprint arXiv:2510.22256, 2025

Xiaoyan Zhao, Ming Yan, Yilun Qiu, Haoting Ni, Yang Zhang, Fuli Feng, Hong Cheng, and Tat-Seng Chua. Steerx: Disentangled steering for llm personalization.arXiv preprint arXiv:2510.22256, 2025

work page arXiv 2025
[55]

Reinforced strategy optimization for conversational recommender systems via network-of-experts.arXiv e-prints, pages arXiv–2509, 2025

Xiaoyan Zhao, Ming Yan, Yang Zhang, Yang Deng, Jian Wang, Fengbin Zhu, Yilun Qiu, Hong Cheng, and Tat-Seng Chua. Reinforced strategy optimization for conversational recommender systems via network-of-experts.arXiv e-prints, pages arXiv–2509, 2025

2025
[56]

Nextquill: Causal preference modeling for enhancing llm personalization

Xiaoyan Zhao, Juntao You, Yang Zhang, Wenjie Wang, Hong Cheng, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. Nextquill: Causal preference modeling for enhancing llm personalization. ICLR, 2026

2026
[57]

abstract_cue

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 19724–19731, 2024. 13 A Evaluation Metrics We evaluate the framework from three perspectives: static state inference, dynamic counseling effectiv...

2024
[58]

Expectation Generation: from q(s_{t-1}), a_{t-1}, transition prior p, predict the structural regularities of the next move
[59]

Observation & Surprise Calculation: assimilate o_t; compute the structural prediction error vs Step 1 expectations as non-semantic regularities
[60]

## Strict Rules - Single valid JSON; no markdown fences; escape internal " as \"

State Update: update q to absorb the surprise; q_delta arrays may contain MULTIPLE add/modify/drop entries simultaneously. ## Strict Rules - Single valid JSON; no markdown fences; escape internal " as \". - cue tokens are short snake_case identifiers. - inference_entropy in [0,1]. ## State Boundary Rules - precontemplation: denies/minimizes the problem, e...
[61]

Acknowledgment, even reluctant, -> contemplation, not precontemplation
[62]

could",

Tentative planning ("could", "should", "I’ll try") -> preparation
[63]

yeah", "I don’t know

Brief responses ("yeah", "I don’t know") inherit context; do not regress. ## Update Operations (q_delta) Each operation is factored into: | Component | Definition | Examples | | transition_semantic | abstract latent transition type | resistance_to_engagement | | behavior_pattern | de-semanticized observable pattern | short_response_with_hedging | | state_...
[64]

What do you still need to learn about this patient? (exploration)
[65]

What action would best advance the patient toward change? (exploitation)
[66]

""{current_state}

Avoid repeating the same action -- try different approaches. Return ONLY the action name, nothing else. G.1.4 Counselor Response Generator Counselor Response Generator: System Prompt As a communication expert with outstanding communication habits, you embody the role of {agent_name} throughout the following dialogues. Here are some of your distinctive per...

2002
[67]

‘current_state‘: the label of the CURRENT patient utterance
[68]

current_state

‘next_state‘: the label the patient is MOST likely to express in their NEXT utterance after the doctor’s latest reply. Output JSON exactly as: {"current_state":"precontemplation|contemplation|preparation", "next_state": "precontemplation|contemplation|preparation", "rationale": "<brief>"} Full history: {history_text} 30

[1] [1]

Large language models empowered personalized web agents

Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, and Tat-Seng Chua. Large language models empowered personalized web agents. InProceedings of the ACM on Web Conference 2025, pages 198–215, 2025

2025

[2] [2]

A survey on dialogue systems: Recent advances and new frontiers.Acm Sigkdd Explorations Newsletter, 19(2):25–35, 2017

Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. A survey on dialogue systems: Recent advances and new frontiers.Acm Sigkdd Explorations Newsletter, 19(2):25–35, 2017

2017

[3] [3]

A personalized dialogue generator with implicit user persona detection.arXiv preprint arXiv:2204.07372, 2022

Itsugun Cho, Dongyang Wang, Ryota Takahashi, and Hiroaki Saito. A personalized dialogue generator with implicit user persona detection.arXiv preprint arXiv:2204.07372, 2022

work page arXiv 2022

[4] [4]

Large language models (llms) inference offloading and resource allocation in cloud-edge networks: An active inference approach

Jingcheng Fang, Ying He, F Richard Yu, Jianqiang Li, and Victor C Leung. Large language models (llms) inference offloading and resource allocation in cloud-edge networks: An active inference approach. In2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), pages 1–5. IEEE, 2023

2023

[5] [5]

The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

Karl Friston. The free-energy principle: a unified brain theory?Nature reviews neuroscience, 11(2):127–138, 2010

2010

[6] [6]

A free energy principle for the brain.Journal of Physiology-Paris, 100(1–3):70–87, 2006

Karl Friston, James Kilner, and Lee Harrison. A free energy principle for the brain.Journal of Physiology-Paris, 100(1–3):70–87, 2006. doi: 10.1016/j.jphysparis.2006.10.001

work page doi:10.1016/j.jphysparis.2006.10.001 2006

[7] [7]

Active inference and epistemic value.Cognitive neuroscience, 6(4):187–214, 2015

Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys, Thomas Fitzgerald, and Giovanni Pezzulo. Active inference and epistemic value.Cognitive neuroscience, 6(4):187–214, 2015

2015

[8] [8]

Active inference and learning.Neuroscience & Biobehavioral Reviews, 68:862–879, 2016

Karl Friston, Thomas FitzGerald, Francesco Rigoli, Philipp Schwartenbeck, Giovanni Pezzulo, et al. Active inference and learning.Neuroscience & Biobehavioral Reviews, 68:862–879, 2016

2016

[9] [9]

Active inference: A process theory.Neural Computation, 29(1):1–49, 2017

Karl Friston, Thomas FitzGerald, Francesco Rigoli, Philipp Schwartenbeck, and Giovanni Pezzulo. Active inference: A process theory.Neural Computation, 29(1):1–49, 2017. doi: 10.1162/NECO_a_00912

work page doi:10.1162/neco_a_00912 2017

[10] [10]

Action and behavior: a free-energy formulation.Biological cybernetics, 102(3):227–260, 2010

Karl J Friston, Jean Daunizeau, James Kilner, and Stefan J Kiebel. Action and behavior: a free-energy formulation.Biological cybernetics, 102(3):227–260, 2010

2010

[11] [11]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Ying He, Jingcheng Fang, F Richard Yu, and Victor C Leung. Large language models (llms) inference offloading and resource allocation in cloud-edge computing: An active inference approach.IEEE Transactions on Mobile Computing, 23(12):11253–11264, 2024

2024

[13] [13]

Mem-pal: Towards memory-based personalized dialogue assistants for long- term user-agent interaction

Zhaopei Huang, Qifeng Dai, Guozheng Wu, Xiaopeng Wu, Xubin Li, Tiezheng Ge, Wenxuan Wang, and Qin Jin. Mem-pal: Towards memory-based personalized dialogue assistants for long- term user-agent interaction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 31229–31237, 2026

2026

[14] [14]

Dialogue management by estimating user’s internal state using movie recommendation dialogue.Journal of Natural Language Processing, 28(1):104–135, 2021

Takashi Kodama, Ribeka Tanaka, and Sadao Kurohashi. Dialogue management by estimating user’s internal state using movie recommendation dialogue.Journal of Natural Language Processing, 28(1):104–135, 2021. 10

2021

[15] [15]

Aligning to thousands of preferences via system message generalization.Advances in Neural Information Processing Systems, 37:73783–73829, 2024

Seongyun Lee, Sue Hyun Park, Seungone Kim, and Minjoon Seo. Aligning to thousands of preferences via system message generalization.Advances in Neural Information Processing Systems, 37:73783–73829, 2024

2024

[16] [16]

Hello again! llm-powered personalized agent for long-term dialogue

Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, and Tat-Seng Chua. Hello again! llm-powered personalized agent for long-term dialogue. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5259–5276, 2025

2025

[17] [17]

A persona-aware llm-enhanced framework for multi-session personalized dialogue generation

Dongshuo Liu, Zhijing Wu, Dandan Song, and He-Yan Huang. A persona-aware llm-enhanced framework for multi-session personalized dialogue generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 103–123, 2025

2025

[18] [18]

L-mtp: Leap multi-token prediction beyond adjacent context for large language models.Advances in Neural Information Processing Systems, 38: 102569–102600, 2026

Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, and Tat-Seng Chua. L-mtp: Leap multi-token prediction beyond adjacent context for large language models.Advances in Neural Information Processing Systems, 38: 102569–102600, 2026

2026

[19] [19]

Miracle: Towards personalized dialogue generation with latent-space multiple personal attribute control

Zhenyi Lu, Wei Wei, Xiaoye Qu, XianLing Mao, Dangyang Chen, and Jixiong Chen. Miracle: Towards personalized dialogue generation with latent-space multiple personal attribute control. arXiv preprint arXiv:2310.18342, 2023

work page arXiv 2023

[20] [20]

Odar: Principled adaptive routing for llm reasoning via active inference.arXiv preprint arXiv:2602.23681, 2026

Siyuan Ma, Bo Gao, Xiaojun Jia, Simeng Qin, Tianlin Li, Ke Ma, Xiaoshuang Jia, Wenqi Ren, and Yang Liu. Odar: Principled adaptive routing for llm reasoning via active inference.arXiv preprint arXiv:2602.23681, 2026

work page arXiv 2026

[21] [21]

Maple: A framework for active preference learning guided by large language models

Saaduddin Mahmud, Mason Nakamura, and Shlomo Zilberstein. Maple: A framework for active preference learning guided by large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27518–27528, 2025

2025

[22] [22]

UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

Shuyi Miao, Wangjie Qiu, Shengda Zhuo, Fei Shen, Dan Lin, Xingtong Yu, Chua Tat-Seng, and Zhiming Zheng. Unidetect: Llm-driven universal fraud detection across heterogeneous blockchains.arXiv preprint arXiv:2604.12329, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

Active inference and human– computer interaction.ACM Transactions on Computer-Human Interaction, 32(6):1–45, 2025

Roderick Murray-Smith, John H Williamson, and Sebastian Stein. Active inference and human– computer interaction.ACM Transactions on Computer-Human Interaction, 32(6):1–45, 2025

2025

[24] [24]

Generating meaning: active inference and the scope and limits of passive ai.Trends in Cognitive Sciences, 28(2): 97–112, 2024

Giovanni Pezzulo, Thomas Parr, Paul Cisek, Andy Clark, and Karl Friston. Generating meaning: active inference and the scope and limits of passive ai.Trends in Cognitive Sciences, 28(2): 97–112, 2024

2024

[25] [25]

Active preference inference using language models and probabilistic reasoning

Top Piriyakulkij, V olodymyr Kuleshov, and Kevin Ellis. Active preference inference using language models and probabilistic reasoning. InNeurIPS 2023 Foundation Models for Decision Making Workshop, 2025

2023

[26] [26]

Initial efficacy of mi, ttm tailoring and hri’s with multiple behaviors for employee health promotion.Preventive medicine, 46(3):226–231, 2008

James O Prochaska, Susan Butterworth, Colleen A Redding, Verna Burden, Nancy Perrin, Michael Leo, Marna Flaherty-Robb, and Janice M Prochaska. Initial efficacy of mi, ttm tailoring and hri’s with multiple behaviors for employee health promotion.Preventive medicine, 46(3):226–231, 2008

2008

[27] [27]

Latent inter-user difference modeling for llm personalization

Yilun Qiu, Tianhao Shi, Xiaoyan Zhao, Fengbin Zhu, Yang Zhang, and Fuli Feng. Latent inter-user difference modeling for llm personalization. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10610–10628, 2025

2025

[28] [28]

Measuring what makes you unique: Difference-aware user modeling for enhancing llm personalization

Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, and Tat-Seng Chua. Measuring what makes you unique: Difference-aware user modeling for enhancing llm personalization. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21258–21277, 2025

2025

[29] [29]

Raptor: Recursive abstractive processing for tree-organized retrieval

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. InThe Twelfth International Conference on Learning Representations, 2024. 11

2024

[30] [30]

Average user-side counterfactual fairness for collaborative filtering.ACM Transactions on Information Systems, 42(5):1–26, 2024

Pengyang Shao, Le Wu, Kun Zhang, Defu Lian, Richang Hong, Yong Li, and Meng Wang. Average user-side counterfactual fairness for collaborative filtering.ACM Transactions on Information Systems, 42(5):1–26, 2024

2024

[31] [31]

Pmg: Personalized multimodal response generation with large language models

Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, and Xi Xiao. Pmg: Personalized multimodal response generation with large language models. InThe Web Conference 2024

2024

[32] [32]

Answering narrative-driven recommendation queries via a retrieve–rank paradigm and the ocg-agent

Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, and Min Xu. Answering narrative-driven recommendation queries via a retrieve–rank paradigm and the ocg-agent. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13192–13213, 2025

2025

[33] [33]

An active inference strategy for prompting reliable responses from large language models in medical practice.npj Digital Medicine, 8(1):119, 2025

Roma Shusterman, Allison C Waters, Shannon O’Neill, Marshall Bangs, Phan Luu, and Don M Tucker. An active inference strategy for prompting reliable responses from large language models in medical practice.npj Digital Medicine, 8(1):119, 2025

2025

[34] [34]

Chain-of-strategy planning with llms: Aligning the gener- ation of psychotherapy dialogue with strategy in motivational interviewing.arXiv preprint arXiv:2408.06527, 2024

Xin Sun, Xiao Tang, Abdallah El Ali, Zhuying Li, Xiaoyu Shen, Pengjie Ren, Jan de Wit, Jiahuan Pei, and Jos A Bosch. Chain-of-strategy planning with llms: Aligning the gener- ation of psychotherapy dialogue with strategy in motivational interviewing.arXiv preprint arXiv:2408.06527, 2024

work page arXiv 2024

[35] [35]

Enhancing personalized dialogue generation with contrastive latent variables.arXiv preprint arXiv:2305.11482, 2023

Yihong Tang, Bo Wang, Miao Fang, Dongming Zhao, Kun Huang, Ruifang He, and Yuexian Hou. Enhancing personalized dialogue generation with contrastive latent variables.arXiv preprint arXiv:2305.11482, 2023

work page arXiv 2023

[36] [36]

Msl: Not all tokens are what you need for tuning llm as a recommender

Bohao Wang, Feng Liu, Jiawei Chen, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Yan Feng, Chun Chen, and Can Wang. Msl: Not all tokens are what you need for tuning llm as a recommender. InProceedings of the 48th international ACM SIGIR conference on research and development in information retrieval, pages 1912–1922, 2025

1912

[37] [37]

Think-while-generating: On-the-fly reasoning for personalized long-form generation.arXiv preprint arXiv:2512.06690, 2025

Chengbing Wang, Yang Zhang, Wenjie Wang, Xiaoyan Zhao, Fuli Feng, Xiangnan He, and Tat-Seng Chua. Think-while-generating: On-the-fly reasoning for personalized long-form generation.arXiv preprint arXiv:2512.06690, 2025

work page arXiv 2025

[38] [38]

Rlpf: Reinforcement learning from prediction feedback for user summarization with llms

Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang, Sushant Prakash, Shawn O’Banion, Bradley Green, and Jun Xie. Rlpf: Reinforcement learning from prediction feedback for user summarization with llms. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 25488–25496, 2025

2025

[39] [39]

Humanlm: Simulating users with state alignment beats response imitation.arXiv preprint arXiv:2603.03303, 2026

Shirley Wu, Evelyn Choi, Arpandeep Khatua, Zhanghan Wang, Joy He-Yueya, Tharindu Cyril Weerasooriya, Wei Wei, Diyi Yang, Jure Leskovec, and James Zou. Humanlm: Simulating users with state alignment beats response imitation.arXiv preprint arXiv:2603.03303, 2026

work page arXiv 2026

[40] [40]

Anno-mi: A dataset of expert-annotated counselling dialogues

Zixiu Wu, Simone Balloccu, Vivek Kumar, Rim Helaoui, Ehud Reiter, Diego Reforgiato Recu- pero, and Daniele Riboni. Anno-mi: A dataset of expert-annotated counselling dialogues. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6177–6181. IEEE, 2022

2022

[41] [41]

Few-shot dialogue strategy learning for motiva- tional interviewing via inductive reasoning

Zhouhang Xie, Bodhisattwa Prasad Majumder, Mengjie Zhao, Yoshinori Maeda, Keiichi Yamada, Hiromi Wakaki, and Julian McAuley. Few-shot dialogue strategy learning for motiva- tional interviewing via inductive reasoning. InFindings of the Association for Computational Linguistics: ACL 2024, pages 13207–13219, 2024

2024

[42] [42]

Gajos, and Dakuo Zhang

Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Krzysztof Z. Gajos, and Dakuo Zhang. Crafting personalized agents through retrieval-augmented generation on editable memory graphs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2024

[43] [43]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Reliable and responsible foundation models.Transactions on Machine Learning Research, 2025

Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, et al. Reliable and responsible foundation models.Transactions on Machine Learning Research, 2025

2025

[45] [45]

Cami: A counselor agent sup- porting motivational interviewing through state inference and topic exploration

Yizhe Yang, Palakorn Achananuparp, He-Yan Huang, Jing Jiang, Phey Ling Kit, Nicholas Gabriel Lim, Cameron Tan Shi Ern, and Ee-Peng Lim. Cami: A counselor agent sup- porting motivational interviewing through state inference and topic exploration. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Paper...

2025

[46] [46]

Consistent client simulation for motivational interviewing-based counseling

Yizhe Yang, Palakorn Achananuparp, He-Yan Huang, Jing Jiang, Nicholas Gabriel Lim, Cameron Tan Shi Ern, Phey Ling Kit, Jenny Giam Xiuhui, John Pinto, and Ee-peng Lim. Consistent client simulation for motivational interviewing-based counseling. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

2025

[47] [47]

Memweaver: A hierarchical memory from textual interactive behaviors for personalized genera- tion

Shuo Yu, Mingyue Cheng, Daoyu Wang, Qi Liu, Zirui Liu, Ze Guo, and Xiaoyu Tao. Memweaver: A hierarchical memory from textual interactive behaviors for personalized genera- tion. InProceedings of the ACM Web Conference 2026, pages 6920–6931, 2026

2026

[48] [48]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[49] [49]

Prime: Large language model personalization with cognitive dual-memory and personalized thought process

Xinliang Frederick Zhang, Nick Beauchamp, and Lu Wang. Prime: Large language model personalization with cognitive dual-memory and personalized thought process. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33695– 33724, 2025

2025

[50] [50]

Reinforced latent reasoning for llm-based recommendation.ICLR, 2026

Yang Zhang, Wenxin Xu, Xiaoyan Zhao, Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. Reinforced latent reasoning for llm-based recommendation.ICLR, 2026

2026

[51] [51]

Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, and Tat-Seng Chua. Nextmem: Towards latent factual memory for llm-based agents.arXiv preprint arXiv:2603.15634, 2026

work page arXiv 2026

[52] [52]

Explicit vs implicit memory: Exploring multi-hop complex reasoning over personalized information

Zeyu Zhang, Yang Zhang, Haoran Tan, Rui Li, and Xu Chen. Explicit vs implicit memory: Exploring multi-hop complex reasoning over personalized information. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 1964–1974, 2026

1964

[53] [53]

Exploring the impact of personality traits on conversational recommender systems: A simulation with large language models.arXiv preprint arXiv:2504.12313, 2025

Xiaoyan Zhao, Yang Deng, Wenjie Wang, Hong Cheng, Rui Zhang, See-Kiong Ng, Tat-Seng Chua, et al. Exploring the impact of personality traits on conversational recommender systems: A simulation with large language models.arXiv preprint arXiv:2504.12313, 2025

work page arXiv 2025

[54] [54]

Steerx: Disentangled steering for llm personalization.arXiv preprint arXiv:2510.22256, 2025

Xiaoyan Zhao, Ming Yan, Yilun Qiu, Haoting Ni, Yang Zhang, Fuli Feng, Hong Cheng, and Tat-Seng Chua. Steerx: Disentangled steering for llm personalization.arXiv preprint arXiv:2510.22256, 2025

work page arXiv 2025

[55] [55]

Reinforced strategy optimization for conversational recommender systems via network-of-experts.arXiv e-prints, pages arXiv–2509, 2025

Xiaoyan Zhao, Ming Yan, Yang Zhang, Yang Deng, Jian Wang, Fengbin Zhu, Yilun Qiu, Hong Cheng, and Tat-Seng Chua. Reinforced strategy optimization for conversational recommender systems via network-of-experts.arXiv e-prints, pages arXiv–2509, 2025

2025

[56] [56]

Nextquill: Causal preference modeling for enhancing llm personalization

Xiaoyan Zhao, Juntao You, Yang Zhang, Wenjie Wang, Hong Cheng, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. Nextquill: Causal preference modeling for enhancing llm personalization. ICLR, 2026

2026

[57] [57]

abstract_cue

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 19724–19731, 2024. 13 A Evaluation Metrics We evaluate the framework from three perspectives: static state inference, dynamic counseling effectiv...

2024

[58] [58]

Expectation Generation: from q(s_{t-1}), a_{t-1}, transition prior p, predict the structural regularities of the next move

[59] [59]

Observation & Surprise Calculation: assimilate o_t; compute the structural prediction error vs Step 1 expectations as non-semantic regularities

[60] [60]

## Strict Rules - Single valid JSON; no markdown fences; escape internal " as \"

State Update: update q to absorb the surprise; q_delta arrays may contain MULTIPLE add/modify/drop entries simultaneously. ## Strict Rules - Single valid JSON; no markdown fences; escape internal " as \". - cue tokens are short snake_case identifiers. - inference_entropy in [0,1]. ## State Boundary Rules - precontemplation: denies/minimizes the problem, e...

[61] [61]

Acknowledgment, even reluctant, -> contemplation, not precontemplation

[62] [62]

could",

Tentative planning ("could", "should", "I’ll try") -> preparation

[63] [63]

yeah", "I don’t know

Brief responses ("yeah", "I don’t know") inherit context; do not regress. ## Update Operations (q_delta) Each operation is factored into: | Component | Definition | Examples | | transition_semantic | abstract latent transition type | resistance_to_engagement | | behavior_pattern | de-semanticized observable pattern | short_response_with_hedging | | state_...

[64] [64]

What do you still need to learn about this patient? (exploration)

[65] [65]

What action would best advance the patient toward change? (exploitation)

[66] [66]

""{current_state}

Avoid repeating the same action -- try different approaches. Return ONLY the action name, nothing else. G.1.4 Counselor Response Generator Counselor Response Generator: System Prompt As a communication expert with outstanding communication habits, you embody the role of {agent_name} throughout the following dialogues. Here are some of your distinctive per...

2002

[67] [67]

‘current_state‘: the label of the CURRENT patient utterance

[68] [68]

current_state

‘next_state‘: the label the patient is MOST likely to express in their NEXT utterance after the doctor’s latest reply. Output JSON exactly as: {"current_state":"precontemplation|contemplation|preparation", "next_state": "precontemplation|contemplation|preparation", "rationale": "<brief>"} Full history: {history_text} 30