arxiv: 2605.08766 · v1 · submitted 2026-05-09 · 💻 cs.IR · cs.CL

Recognition: no theorem link

UserGPT Technical Report

Yunyi Xuan , Hao Yi , Fengling Mao , Daye Cai , Leikun Liang , Xingsheng He , Jiangnan Xie , Guoshuai Wang

show 4 more authors

Yushan Han Wenwen Guo Xiaoxiao Xu Lin Qu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:59 UTC · model grok-4.3

classification 💻 cs.IR cs.CL

keywords user profilinggenerative personalizationbehavioral simulationLLM fine-tuningpersona reasoningdata semantizationholistic user models

0 comments

The pith

UserGPT turns noisy user behavior histories into coherent generative personas using simulation and targeted LLM training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting user profiling from fragmented discriminative models to a generative LLM approach that summarizes long behavioral traces into consistent narratives capturing user evolution. It tackles data scarcity and noise through a simulation engine for realistic trajectories, a semantization module for structured inputs, and a multi-stage training curriculum combining supervised fine-tuning with a dual-filter policy optimization method. If effective, this would enable LLMs to perform holistic persona reasoning that generalizes better to complex and long-tail behaviors while dramatically reducing the volume of stored records.

Core claim

UserGPT is a framework that improves LLM-based persona understanding by generating attributes and summaries from behavioral histories. It relies on a User Behavior Simulation Engine to create complex trajectories, a Data-Centric Semantization module to convert logs into coherent inputs, and a curriculum-driven post-training process with Supervised Fine-Tuning plus Dual-Filter Group Relative Policy Optimization. On the derived HPR-Bench benchmark, the resulting model produces accurate tag predictions and summary generations while compressing the original records substantially and retaining essential information.

What carries the argument

The User Behavior Simulation Engine combined with Data-Centric Semantization and curriculum post-training that equips LLMs to reason over extended, noisy histories.

If this is right

LLMs become capable of capturing nuanced and implicit aspects of user evolution that discrete attribute models miss.
Storage and processing costs for user histories drop sharply while core details remain usable for downstream tasks.
Personalized agent interactions can draw on compressed yet logically consistent profiles instead of raw logs.
Long-tail and evolving behaviors become easier to model without manual feature engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be adapted to other domains that involve summarizing sparse event sequences, such as health records or transaction logs.
Real-world deployment would require ongoing checks that simulation fidelity does not introduce systematic biases.
Future versions might incorporate online updates so personas evolve as new user actions arrive.

Load-bearing premise

The simulated user trajectories are realistic enough that training on them produces models that work on actual human behavioral data.

What would settle it

Running UserGPT on a set of real-world digital traces and measuring agreement between its generated personas and direct user feedback or expert review of those same traces.

read the original abstract

Personalized user understanding from large-scale digital traces remains a fundamental challenge. Traditional user profiling methods rely on discriminative models and manual feature engineering to predict discrete attributes, often producing fragmented and logically inconsistent profiles that generalize poorly to long-tail behaviors. In this work, we study a generative paradigm in which large language models (LLMs) summarize long and noisy behavioral histories into coherent narratives that capture nuanced user evolution. Our experiments show that even strong LLMs remain limited in complex and implicit personalization reasoning. We propose UserGPT, a framework for improving LLM-based persona understanding through both attribute generation and summary generation. To address the scarcity of real-world behavioral data, we develop a User Behavior Simulation Engine that produces realistic and complex user trajectories. We further introduce a Data-Centric Semantization module that transforms heterogeneous behavioral logs into structured and semantically coherent inputs, reducing noise and sparsity. On top of this pipeline, we design a curriculum-driven post-training strategy that combines multi-stage Supervised Fine-Tuning (SFT) with Dual-Filter Group Relative Policy Optimization (DF-GRPO) to strengthen reasoning over long behavioral histories. We also construct HPR-Bench, a benchmark for holistic persona reasoning derived from simulated data. On HPR-Bench, UserGPT achieves an Avg@10 score of 0.7325 on tag prediction and an $Acc_{Ex}$ score of 0.7528 on summary generation, while compressing behavioral records by up to 97.9% with critical information preserved. These results demonstrate the effectiveness of UserGPT for holistic persona reasoning and personalized user-agent interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UserGPT's numbers come from a closed simulation loop with no external real-data checks, so the persona gains are plausible but unproven outside their generator.

read the letter

The main thing to know is that all the reported results sit inside a self-contained synthetic loop. They built a User Behavior Simulation Engine, used it to create training trajectories and the HPR-Bench, ran their Data-Centric Semantization and DF-GRPO stages on that data, and got Avg@10 of 0.7325 on tag prediction plus 0.7528 on summary generation with up to 97.9% compression. Without any distributional comparison to real logs or transfer tests, those figures only confirm the pipeline works on its own generated distribution.

Referee Report

2 major / 1 minor

Summary. The manuscript presents UserGPT, a generative LLM-based framework for holistic persona reasoning from long, noisy user behavioral histories. To address real-data scarcity, it introduces a User Behavior Simulation Engine for generating trajectories, a Data-Centric Semantization module to structure logs, and a curriculum post-training pipeline combining multi-stage SFT with Dual-Filter Group Relative Policy Optimization (DF-GRPO). It constructs HPR-Bench from the same simulated data and reports an Avg@10 score of 0.7325 on tag prediction, an Acc_Ex score of 0.7528 on summary generation, and up to 97.9% compression while preserving critical information.

Significance. If the simulated trajectories prove representative of real user logs, the work could meaningfully advance personalized modeling by demonstrating a scalable generative alternative to fragmented discriminative profiling, with practical value in the reported compression rates for user-agent systems. The curriculum strategy and DF-GRPO offer concrete technical contributions to long-context reasoning. The simulation engine itself is a pragmatic response to data scarcity and could be reusable. Currently, however, the lack of external grounding confines demonstrated gains to an artificial closed loop.

major comments (2)

[Abstract] Abstract: The central quantitative claims (Avg@10 = 0.7325 on tag prediction and Acc_Ex = 0.7528 on summary generation) are obtained exclusively on HPR-Bench, which is derived from the authors' User Behavior Simulation Engine—the identical source used for training data and hyperparameter tuning. No distributional divergence metrics, human realism ratings, or transfer experiments to an independent real trace corpus are reported, so the scores demonstrate in-distribution performance on a self-generated process rather than improved reasoning on actual behavioral histories.
[Abstract] Abstract and methods description: No baselines (e.g., standard LLM prompting, prior user-profiling models), error bars, or ablation studies isolating the contributions of Data-Centric Semantization, curriculum stages, or DF-GRPO are supplied. This absence makes it impossible to determine whether the reported scores reflect genuine advances or simply the result of tuning within the closed synthetic distribution.

minor comments (1)

[Abstract] The metric Acc_Ex is referenced without an explicit definition or formula in the abstract; adding a brief parenthetical or pointer to its computation would aid readability.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, indicating planned changes to the manuscript where appropriate.

read point-by-point responses

Referee: The central quantitative claims (Avg@10 = 0.7325 on tag prediction and Acc_Ex = 0.7528 on summary generation) are obtained exclusively on HPR-Bench, which is derived from the authors' User Behavior Simulation Engine—the identical source used for training data and hyperparameter tuning. No distributional divergence metrics, human realism ratings, or transfer experiments to an independent real trace corpus are reported, so the scores demonstrate in-distribution performance on a self-generated process rather than improved reasoning on actual behavioral histories.

Authors: We agree that all reported results are obtained on trajectories generated by the User Behavior Simulation Engine, which is also used to create training data. This design is motivated by the scarcity of publicly available, large-scale real user behavioral histories suitable for LLM training and evaluation. The engine is constructed to produce complex, noisy, and long-tail trajectories that mirror real-world characteristics, enabling controlled study of holistic persona reasoning. We acknowledge that this constitutes a closed synthetic loop and does not provide direct evidence of generalization to external real traces. In revision, we will add an explicit limitations subsection clarifying the synthetic nature of HPR-Bench, the motivation for simulation, and the scope of our claims. We will also report any internal distributional similarity metrics between simulated and real logs that are available from our development process. revision: partial
Referee: No baselines (e.g., standard LLM prompting, prior user-profiling models), error bars, or ablation studies isolating the contributions of Data-Centric Semantization, curriculum stages, or DF-GRPO are supplied. This absence makes it impossible to determine whether the reported scores reflect genuine advances or simply the result of tuning within the closed synthetic distribution.

Authors: We accept that the current version omits baselines, error bars, and component ablations, which limits assessment of incremental contributions. In the revised manuscript we will add: (1) baseline results from standard prompting (zero-shot and few-shot) of the base LLM; (2) comparisons against representative prior user-profiling methods where feasible; (3) systematic ablations that isolate the Data-Centric Semantization module, individual curriculum SFT stages, and the DF-GRPO objective; and (4) error bars from multiple random seeds for all key metrics. These additions will allow readers to evaluate the specific impact of each proposed element. revision: yes

standing simulated objections not resolved

Transfer experiments to independent real user trace corpora, as no suitable external real datasets were available or used in this study.
Human realism ratings or external validation of simulated trajectory fidelity beyond the internal design criteria of the simulation engine.

Circularity Check

0 steps flagged

No significant circularity; empirical results on internally generated synthetic data

full rationale

The paper develops a User Behavior Simulation Engine to generate trajectories due to acknowledged real-data scarcity, applies Data-Centric Semantization and DF-GRPO training on that data, constructs HPR-Bench from the same simulated distribution, and reports measured scores (Avg@10 = 0.7325, Acc_Ex = 0.7528, 97.9% compression). These are explicit empirical evaluations rather than a derivation or prediction that reduces to the inputs by construction. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are present that would make the reported metrics tautological. The framework is self-contained as a practical pipeline for the synthetic setting; lack of external real-trace validation is a generalization concern, not a circularity in the claimed chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claims rest on the validity of simulated trajectories as proxies for real behavior and on the assumption that staged LLM post-training yields generalizable persona reasoning; several new modules are introduced without external validation.

free parameters (2)

Curriculum stage counts and DF-GRPO hyperparameters
Chosen to strengthen long-history reasoning; values not reported but required for the training pipeline.
Simulation engine parameters controlling trajectory complexity
Tuned to produce realistic yet complex behaviors used for both training and benchmarking.

axioms (2)

domain assumption Large language models can produce coherent, logically consistent user personas from noisy behavioral histories when given appropriate training data and objectives.
Invoked as the core generative paradigm replacing discriminative profiling.
ad hoc to paper Simulated user trajectories are sufficiently representative of real-world distributions to support both training and evaluation.
Explicitly adopted due to scarcity of real behavioral data.

invented entities (2)

Dual-Filter Group Relative Policy Optimization (DF-GRPO) no independent evidence
purpose: Post-training objective to improve reasoning over long behavioral histories
New optimization variant introduced without prior reference.
Data-Centric Semantization module no independent evidence
purpose: Convert heterogeneous logs into structured semantic inputs
New preprocessing component to reduce noise and sparsity.

pith-pipeline@v0.9.0 · 5617 in / 1678 out tokens · 69721 ms · 2026-05-12T02:59:40.318191+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 6 internal anchors

[1]

Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities , pages=

Author age prediction from text using linear regression , author=. Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities , pages=

work page
[2]

Natural Language Processing Journal , volume=

Gender prediction with descriptive textual data using a machine learning approach , author=. Natural Language Processing Journal , volume=. 2023 , publisher=

work page 2023
[3]

2018 fourth international conference on computing communication control and automation (ICCUBEA) , pages=

Career prediction model using data mining and linear classification , author=. 2018 fourth international conference on computing communication control and automation (ICCUBEA) , pages=. 2018 , organization=

work page 2018
[4]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

Empowering general-purpose user representation with full-life cycle behavior modeling , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

work page
[5]

2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) , pages=

Interaction-aware Hypergraph Neural Networks for User Profiling , author=. 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) , pages=. 2022 , organization=

work page 2022
[6]

Proceedings of the sixteenth ACM international conference on web search and data mining , pages=

Knowledge enhancement for contrastive multi-behavior recommendation , author=. Proceedings of the sixteenth ACM international conference on web search and data mining , pages=

work page
[8]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

You are what you bought: Generating customer personas for e-commerce applications , author=. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

work page
[10]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[11]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=. 2024 , url=

work page 2024
[14]

Advances in Neural Information Processing Systems , year=

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models , author=. Advances in Neural Information Processing Systems , year=

work page
[15]

2021 , howpublished=

BUSTM: OPPO XiaoBu Dialogue Short Text Matching Dataset , author=. 2021 , howpublished=

work page 2021
[16]

2019 , eprint=

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension , author=. 2019 , eprint=

work page 2019
[18]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004
[19]

2025 , eprint=

ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models , author=. 2025 , eprint=

work page 2025
[20]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024
[22]

Wornell and Subhro Das and David Daniel Cox and Chuang Gan , booktitle=

Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory W. Wornell and Subhro Das and David Daniel Cox and Chuang Gan , booktitle=. Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances. 2025 , url=

work page 2025
[23]

Annual Conference on Neural Information Processing Systems , year=

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning , author=. Annual Conference on Neural Information Processing Systems , year=

work page
[24]

2023 , eprint=

Towards General Text Embeddings with Multi-stage Contrastive Learning , author=. 2023 , eprint=

work page 2023
[25]

and Le, Quoc V

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =

work page 2022
[27]

Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages=

Prompt tuning as user inherent profile inference machine , author=. Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages=

work page
[29]

Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages=

Towards explainable temporal user profiling with LLMs , author=. Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages=

work page
[30]

Proceedings of the Nineteenth ACM Conference on Recommender Systems , pages=

Evaluating podcast recommendations with profile-aware llm-as-a-judge , author=. Proceedings of the Nineteenth ACM Conference on Recommender Systems , pages=

work page
[31]

2026 , eprint=

ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation , author=. 2026 , eprint=

work page 2026
[33]

IEEE Transactions on evolutionary computation , volume=

A survey on evolutionary computation approaches to feature selection , author=. IEEE Transactions on evolutionary computation , volume=. 2015 , publisher=

work page 2015
[34]

Machine learning , volume=

Special issue on feature engineering editorial , author=. Machine learning , volume=. 2024 , publisher=

work page 2024
[35]

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

work page
[37]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Large language models can learn temporal reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[40]

Companion Proceedings of the ACM Web Conference 2024 , pages=

Curriculum learning: Theories, approaches, applications, tools, and future directions in the era of large language models , author=. Companion Proceedings of the ACM Web Conference 2024 , pages=

work page 2024
[41]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[43]

2025 , eprint=

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models , author=. 2025 , eprint=

work page 2025
[44]

2025 , eprint=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

work page 2025
[45]

2026 , eprint=

Kimi K2: Open Agentic Intelligence , author=. 2026 , eprint=

work page 2026
[46]

2026 , eprint=

GLM-5: from Vibe Coding to Agentic Engineering , author=. 2026 , eprint=

work page 2026
[47]

Beyond Goldfish Memory: Long-Term Open-Domain Conversation , url =

Xu, Jing and Szlam, Arthur and Weston, Jason. Beyond Goldfish Memory: Long-Term Open-Domain Conversation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.356

work page doi:10.18653/v1/2022.acl-long.356 2022
[48]

Evaluating Very Long-Term Conversational Memory of

Maharana, Adyasha and Lee, Dong-Ho and Tulyakov, Sergey and Bansal, Mohit and Barbieri, Francesco and Fang, Yuwei. Evaluating Very Long-Term Conversational Memory of LLM Agents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.747

work page doi:10.18653/v1/2024.acl-long.747 2024
[49]

ArXiv , year=

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment , author=. ArXiv , year=

work page
[51]

Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik , journal=

work page
[53]

IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=

Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs , author=. IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=. 2014 , publisher=

work page 2014
[55]

G. A. Baker, A. Raut, S. Shaier, et al. Lost in the middle, and in-between: Enhancing language models' ability to reason over long contexts in multi-hop qa. arXiv preprint arXiv:2412.10079, 2024

work page arXiv 2024
[56]

O. X. C.-A. Center. Bustm: Oppo xiaobu dialogue short text matching dataset. https://github.com/xiaobu-coai/BUSTM, 2021

work page 2021
[57]

H. Chen, K. Lv, C. Hu, et al. Chineseecomqa: A scalable e-commerce concept evaluation benchmark for large language models, 2025. URL https://arxiv.org/abs/2502.20196

work page arXiv 2025
[58]

DeepSeek-AI, A. Liu, A. Mei, et al. Deepseek-v3.2: Pushing the frontier of open large language models, 2025. URL https://arxiv.org/abs/2512.02556

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

F. Duan, X. Huang and Z. Wei. Lifesim: Long-horizon user life simulator for personalized assistant evaluation. arXiv preprint arXiv:2603.12152, 2026

work page arXiv 2026
[60]

Fabbri, G

F. Fabbri, G. Penha, E. D'Amico, et al. Evaluating podcast recommendations with profile-aware llm-as-a-judge. In Proceedings of the Nineteenth ACM Conference on Recommender Systems, pages 1181--1186, 2025

work page 2025
[61]

Fatemi, M

B. Fatemi, M. Kazemi, A. Tsitsulin, et al. Test of time: A benchmark for evaluating llms on temporal reasoning. arXiv preprint arXiv:2406.09170, 2024

work page arXiv 2024
[62]

T. Ge, X. Chan, X. Wang, et al. Scaling synthetic data creation with 1,000,000,000 personas. arXiv preprint arXiv:2406.20094, 2024

work page arXiv 2024
[63]

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5-Team, :, A. Zeng, et al. Glm-5: from vibe coding to agentic engineering, 2026. URL https://arxiv.org/abs/2602.15763

work page internal anchor Pith review Pith/arXiv arXiv 2026
[64]

D. Guo, D. Yang, H. Zhang, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645 0 (8081): 0 633–638, Sept. 2025. ISSN 1476-4687. doi:10.1038/s41586-025-09422-z. URL http://dx.doi.org/10.1038/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025
[65]

W. He, Y. Sun, H. Hao, et al. Vitabench: Benchmarking llm agents with versatile interactive tasks in real-world applications. arXiv preprint arXiv:2509.26490, 2025

work page arXiv 2025
[66]

Huang, Y

Y. Huang, Y. Bai, Z. Zhu, et al. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. In Advances in Neural Information Processing Systems, 2023

work page 2023
[67]

Kimi K2.5: Visual Agentic Intelligence

Kimi Team . Kimi K2.5: Visual Agentic Intelligence . Technical Report, Moonshot AI, January 2026. URL https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_report.pdf

work page 2026
[68]

J.-N. Li, J. Guan, S. Wu, et al. From 1,000,000 users to every user: Scaling up personalized preference for user-level alignment. ArXiv, abs/2503.15463, 2025 a . URL https://api.semanticscholar.org/CorpusID:277113478

work page arXiv 2025
[69]

Y. Li, J. Zhao, X. Ren, et al. Conf-profile: A confidence-driven reasoning paradigm for label-free user profiling. arXiv preprint arXiv:2509.18864, 2025 b

work page arXiv 2025
[70]

Z. Li, X. Zhang, Y. Zhang, et al. Towards general text embeddings with multi-stage contrastive learning, 2023

work page 2023
[71]

N. F. Liu, K. Lin, J. Hewitt, et al. Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics, 12: 0 157--173, 2024 a

work page 2024
[72]

Y. Liu, J. Liu, X. Shi, et al. Let's learn step by step: Enhancing in-context learning ability with curriculum learning. arXiv preprint arXiv:2402.10738, 2024 b

work page arXiv 2024
[73]

Y. Lu, Z. Du, X. Li, et al. Prompt tuning as user inherent profile inference machine. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 5898--5906, 2025

work page 2025
[74]

Nguyen, N

D. Nguyen, N. A. Smith and C. Rose. Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities, pages 115--123, 2011

work page 2011
[75]

Onikoyi, N

B. Onikoyi, N. Nnamoko and I. Korkontzelos. Gender prediction with descriptive textual data using a machine learning approach. Natural Language Processing Journal, 4: 0 100018, 2023

work page 2023
[76]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 0 27730--27744, 2022

work page 2022
[77]

B leu: a method for automatic evaluation of machine translation

K. Papineni, S. Roukos, T. Ward, et al. B leu: a method for automatic evaluation of machine translation. In P. Isabelle, E. Charniak and D. Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi:10.3115...

work page doi:10.3115/1073083.1073135 2002
[78]

N. J. Prottasha, M. Kowsher, H. Raman, et al. User profile with large language models: Construction, updating, and benchmarking. arXiv preprint arXiv:2502.10660, 2025

work page arXiv 2025
[79]

Qwen3.6-Plus : Towards real world agents, April 2026

Qwen Team . Qwen3.6-Plus : Towards real world agents, April 2026. URL https://qwen.ai/blog?id=qwen3.6

work page 2026
[80]

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He

S. Rajbhandari, J. Rasley, O. Ruwase, et al. Zero: Memory optimization towards training A trillion parameter models. CoRR, abs/1910.02054, 2019. URL http://arxiv.org/abs/1910.02054

work page arXiv 1910
[81]

R. H. Rangnekar, K. P. Suratwala, S. Krishna, et al. Career prediction model using data mining and linear classification. In 2018 fourth international conference on computing communication control and automation (ICCUBEA), pages 1--6. IEEE, 2018

work page 2018
[82]

Sabouri, M

M. Sabouri, M. Mansoury, K. Lin, et al. Towards explainable temporal user profiling with llms. In Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pages 219--227, 2025

work page 2025
[83]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[84]

Z. Shao, P. Wang, Q. Zhu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[85]

Y. Shi, Y. Fei, S. Zhang, et al. You are what you bought: Generating customer personas for e-commerce applications. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810--1819, 2025 a

work page 2025
[86]

Y. Shi, Y. Fei, S. Zhang, et al. You are what you bought: Generating customer personas for e-commerce applications. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810--1819, 2025 b

work page 2025
[87]

K. Sun, D. Yu, D. Yu, et al. Investigating prior knowledge for challenging chinese machine reading comprehension, 2019. URL https://arxiv.org/abs/1904.09679

work page arXiv 2019
[88]

A. A. A. Team, Y. Hu, X. Zhang, et al. Amap agentic planning technical report. arXiv preprint arXiv:2512.24957, 2025

work page arXiv 2025
[89]

Q. Team. Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[90]

J. Wang, K. Xiao, H. Zhao, et al. Productresearch: Training e-commerce deep research agents via multi-agent synthetic trajectory distillation, 2026. URL https://arxiv.org/abs/2602.23716

work page arXiv 2026
[91]

W. Wang, S. Xiong, G. Chen, et al. Reinforcement learning optimization for large-scale learning: An efficient and user-friendly scaling library. arXiv preprint arXiv:2506.06122, 2025

work page arXiv 2025
[92]

X. Wang, Y. Zhou, H. Chen, et al. Curriculum learning: Theories, approaches, applications, tools, and future directions in the era of large language models. In Companion Proceedings of the ACM Web Conference 2024, pages 1306--1310, 2024

work page 2024
[93]

J. Wei, X. Wang, D. Schuurmans, et al. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA, 2022. Curran Associates Inc. ISBN 9781713871088

work page 2022
[94]

Xiong, A

S. Xiong, A. Payani, R. Kompella, et al. Large language models can learn temporal reasoning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10452--10470, 2024

work page 2024
[95]

H. Xuan, Y. Liu, B. Li, et al. Knowledge enhancement for contrastive multi-behavior recommendation. In Proceedings of the sixteenth ACM international conference on web search and data mining, pages 195--203, 2023

work page 2023
[96]

B. Yang, J. Gu, K. Liu, et al. Empowering general-purpose user representation with full-life cycle behavior modeling. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2908--2917, 2023

work page 2023

Showing first 80 references.