pith. machine review for the scientific record. sign in

arxiv: 2605.08766 · v1 · submitted 2026-05-09 · 💻 cs.IR · cs.CL

Recognition: no theorem link

UserGPT Technical Report

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:59 UTC · model grok-4.3

classification 💻 cs.IR cs.CL
keywords user profilinggenerative personalizationbehavioral simulationLLM fine-tuningpersona reasoningdata semantizationholistic user models
0
0 comments X

The pith

UserGPT turns noisy user behavior histories into coherent generative personas using simulation and targeted LLM training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting user profiling from fragmented discriminative models to a generative LLM approach that summarizes long behavioral traces into consistent narratives capturing user evolution. It tackles data scarcity and noise through a simulation engine for realistic trajectories, a semantization module for structured inputs, and a multi-stage training curriculum combining supervised fine-tuning with a dual-filter policy optimization method. If effective, this would enable LLMs to perform holistic persona reasoning that generalizes better to complex and long-tail behaviors while dramatically reducing the volume of stored records.

Core claim

UserGPT is a framework that improves LLM-based persona understanding by generating attributes and summaries from behavioral histories. It relies on a User Behavior Simulation Engine to create complex trajectories, a Data-Centric Semantization module to convert logs into coherent inputs, and a curriculum-driven post-training process with Supervised Fine-Tuning plus Dual-Filter Group Relative Policy Optimization. On the derived HPR-Bench benchmark, the resulting model produces accurate tag predictions and summary generations while compressing the original records substantially and retaining essential information.

What carries the argument

The User Behavior Simulation Engine combined with Data-Centric Semantization and curriculum post-training that equips LLMs to reason over extended, noisy histories.

If this is right

  • LLMs become capable of capturing nuanced and implicit aspects of user evolution that discrete attribute models miss.
  • Storage and processing costs for user histories drop sharply while core details remain usable for downstream tasks.
  • Personalized agent interactions can draw on compressed yet logically consistent profiles instead of raw logs.
  • Long-tail and evolving behaviors become easier to model without manual feature engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pipeline could be adapted to other domains that involve summarizing sparse event sequences, such as health records or transaction logs.
  • Real-world deployment would require ongoing checks that simulation fidelity does not introduce systematic biases.
  • Future versions might incorporate online updates so personas evolve as new user actions arrive.

Load-bearing premise

The simulated user trajectories are realistic enough that training on them produces models that work on actual human behavioral data.

What would settle it

Running UserGPT on a set of real-world digital traces and measuring agreement between its generated personas and direct user feedback or expert review of those same traces.

read the original abstract

Personalized user understanding from large-scale digital traces remains a fundamental challenge. Traditional user profiling methods rely on discriminative models and manual feature engineering to predict discrete attributes, often producing fragmented and logically inconsistent profiles that generalize poorly to long-tail behaviors. In this work, we study a generative paradigm in which large language models (LLMs) summarize long and noisy behavioral histories into coherent narratives that capture nuanced user evolution. Our experiments show that even strong LLMs remain limited in complex and implicit personalization reasoning. We propose UserGPT, a framework for improving LLM-based persona understanding through both attribute generation and summary generation. To address the scarcity of real-world behavioral data, we develop a User Behavior Simulation Engine that produces realistic and complex user trajectories. We further introduce a Data-Centric Semantization module that transforms heterogeneous behavioral logs into structured and semantically coherent inputs, reducing noise and sparsity. On top of this pipeline, we design a curriculum-driven post-training strategy that combines multi-stage Supervised Fine-Tuning (SFT) with Dual-Filter Group Relative Policy Optimization (DF-GRPO) to strengthen reasoning over long behavioral histories. We also construct HPR-Bench, a benchmark for holistic persona reasoning derived from simulated data. On HPR-Bench, UserGPT achieves an Avg@10 score of 0.7325 on tag prediction and an $Acc_{Ex}$ score of 0.7528 on summary generation, while compressing behavioral records by up to 97.9% with critical information preserved. These results demonstrate the effectiveness of UserGPT for holistic persona reasoning and personalized user-agent interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents UserGPT, a generative LLM-based framework for holistic persona reasoning from long, noisy user behavioral histories. To address real-data scarcity, it introduces a User Behavior Simulation Engine for generating trajectories, a Data-Centric Semantization module to structure logs, and a curriculum post-training pipeline combining multi-stage SFT with Dual-Filter Group Relative Policy Optimization (DF-GRPO). It constructs HPR-Bench from the same simulated data and reports an Avg@10 score of 0.7325 on tag prediction, an Acc_Ex score of 0.7528 on summary generation, and up to 97.9% compression while preserving critical information.

Significance. If the simulated trajectories prove representative of real user logs, the work could meaningfully advance personalized modeling by demonstrating a scalable generative alternative to fragmented discriminative profiling, with practical value in the reported compression rates for user-agent systems. The curriculum strategy and DF-GRPO offer concrete technical contributions to long-context reasoning. The simulation engine itself is a pragmatic response to data scarcity and could be reusable. Currently, however, the lack of external grounding confines demonstrated gains to an artificial closed loop.

major comments (2)
  1. [Abstract] Abstract: The central quantitative claims (Avg@10 = 0.7325 on tag prediction and Acc_Ex = 0.7528 on summary generation) are obtained exclusively on HPR-Bench, which is derived from the authors' User Behavior Simulation Engine—the identical source used for training data and hyperparameter tuning. No distributional divergence metrics, human realism ratings, or transfer experiments to an independent real trace corpus are reported, so the scores demonstrate in-distribution performance on a self-generated process rather than improved reasoning on actual behavioral histories.
  2. [Abstract] Abstract and methods description: No baselines (e.g., standard LLM prompting, prior user-profiling models), error bars, or ablation studies isolating the contributions of Data-Centric Semantization, curriculum stages, or DF-GRPO are supplied. This absence makes it impossible to determine whether the reported scores reflect genuine advances or simply the result of tuning within the closed synthetic distribution.
minor comments (1)
  1. [Abstract] The metric Acc_Ex is referenced without an explicit definition or formula in the abstract; adding a brief parenthetical or pointer to its computation would aid readability.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, indicating planned changes to the manuscript where appropriate.

read point-by-point responses
  1. Referee: The central quantitative claims (Avg@10 = 0.7325 on tag prediction and Acc_Ex = 0.7528 on summary generation) are obtained exclusively on HPR-Bench, which is derived from the authors' User Behavior Simulation Engine—the identical source used for training data and hyperparameter tuning. No distributional divergence metrics, human realism ratings, or transfer experiments to an independent real trace corpus are reported, so the scores demonstrate in-distribution performance on a self-generated process rather than improved reasoning on actual behavioral histories.

    Authors: We agree that all reported results are obtained on trajectories generated by the User Behavior Simulation Engine, which is also used to create training data. This design is motivated by the scarcity of publicly available, large-scale real user behavioral histories suitable for LLM training and evaluation. The engine is constructed to produce complex, noisy, and long-tail trajectories that mirror real-world characteristics, enabling controlled study of holistic persona reasoning. We acknowledge that this constitutes a closed synthetic loop and does not provide direct evidence of generalization to external real traces. In revision, we will add an explicit limitations subsection clarifying the synthetic nature of HPR-Bench, the motivation for simulation, and the scope of our claims. We will also report any internal distributional similarity metrics between simulated and real logs that are available from our development process. revision: partial

  2. Referee: No baselines (e.g., standard LLM prompting, prior user-profiling models), error bars, or ablation studies isolating the contributions of Data-Centric Semantization, curriculum stages, or DF-GRPO are supplied. This absence makes it impossible to determine whether the reported scores reflect genuine advances or simply the result of tuning within the closed synthetic distribution.

    Authors: We accept that the current version omits baselines, error bars, and component ablations, which limits assessment of incremental contributions. In the revised manuscript we will add: (1) baseline results from standard prompting (zero-shot and few-shot) of the base LLM; (2) comparisons against representative prior user-profiling methods where feasible; (3) systematic ablations that isolate the Data-Centric Semantization module, individual curriculum SFT stages, and the DF-GRPO objective; and (4) error bars from multiple random seeds for all key metrics. These additions will allow readers to evaluate the specific impact of each proposed element. revision: yes

standing simulated objections not resolved
  • Transfer experiments to independent real user trace corpora, as no suitable external real datasets were available or used in this study.
  • Human realism ratings or external validation of simulated trajectory fidelity beyond the internal design criteria of the simulation engine.

Circularity Check

0 steps flagged

No significant circularity; empirical results on internally generated synthetic data

full rationale

The paper develops a User Behavior Simulation Engine to generate trajectories due to acknowledged real-data scarcity, applies Data-Centric Semantization and DF-GRPO training on that data, constructs HPR-Bench from the same simulated distribution, and reports measured scores (Avg@10 = 0.7325, Acc_Ex = 0.7528, 97.9% compression). These are explicit empirical evaluations rather than a derivation or prediction that reduces to the inputs by construction. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are present that would make the reported metrics tautological. The framework is self-contained as a practical pipeline for the synthetic setting; lack of external real-trace validation is a generalization concern, not a circularity in the claimed chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The central claims rest on the validity of simulated trajectories as proxies for real behavior and on the assumption that staged LLM post-training yields generalizable persona reasoning; several new modules are introduced without external validation.

free parameters (2)
  • Curriculum stage counts and DF-GRPO hyperparameters
    Chosen to strengthen long-history reasoning; values not reported but required for the training pipeline.
  • Simulation engine parameters controlling trajectory complexity
    Tuned to produce realistic yet complex behaviors used for both training and benchmarking.
axioms (2)
  • domain assumption Large language models can produce coherent, logically consistent user personas from noisy behavioral histories when given appropriate training data and objectives.
    Invoked as the core generative paradigm replacing discriminative profiling.
  • ad hoc to paper Simulated user trajectories are sufficiently representative of real-world distributions to support both training and evaluation.
    Explicitly adopted due to scarcity of real behavioral data.
invented entities (2)
  • Dual-Filter Group Relative Policy Optimization (DF-GRPO) no independent evidence
    purpose: Post-training objective to improve reasoning over long behavioral histories
    New optimization variant introduced without prior reference.
  • Data-Centric Semantization module no independent evidence
    purpose: Convert heterogeneous logs into structured semantic inputs
    New preprocessing component to reduce noise and sparsity.

pith-pipeline@v0.9.0 · 5617 in / 1678 out tokens · 69721 ms · 2026-05-12T02:59:40.318191+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 6 internal anchors

  1. [1]

    Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities , pages=

    Author age prediction from text using linear regression , author=. Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities , pages=

  2. [2]

    Natural Language Processing Journal , volume=

    Gender prediction with descriptive textual data using a machine learning approach , author=. Natural Language Processing Journal , volume=. 2023 , publisher=

  3. [3]

    2018 fourth international conference on computing communication control and automation (ICCUBEA) , pages=

    Career prediction model using data mining and linear classification , author=. 2018 fourth international conference on computing communication control and automation (ICCUBEA) , pages=. 2018 , organization=

  4. [4]

    Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

    Empowering general-purpose user representation with full-life cycle behavior modeling , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

  5. [5]

    2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) , pages=

    Interaction-aware Hypergraph Neural Networks for User Profiling , author=. 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) , pages=. 2022 , organization=

  6. [6]

    Proceedings of the sixteenth ACM international conference on web search and data mining , pages=

    Knowledge enhancement for contrastive multi-behavior recommendation , author=. Proceedings of the sixteenth ACM international conference on web search and data mining , pages=

  7. [8]

    Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

    You are what you bought: Generating customer personas for e-commerce applications , author=. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

  8. [10]

    2025 , eprint=

    Qwen3 Technical Report , author=. 2025 , eprint=

  9. [11]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=. 2024 , url=

  10. [14]

    Advances in Neural Information Processing Systems , year=

    C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models , author=. Advances in Neural Information Processing Systems , year=

  11. [15]

    2021 , howpublished=

    BUSTM: OPPO XiaoBu Dialogue Short Text Matching Dataset , author=. 2021 , howpublished=

  12. [16]

    2019 , eprint=

    Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension , author=. 2019 , eprint=

  13. [18]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

  14. [19]

    2025 , eprint=

    ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models , author=. 2025 , eprint=

  15. [20]

    2024 , eprint=

    DeepSeek-V3 Technical Report , author=. 2024 , eprint=

  16. [22]

    Wornell and Subhro Das and David Daniel Cox and Chuang Gan , booktitle=

    Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory W. Wornell and Subhro Das and David Daniel Cox and Chuang Gan , booktitle=. Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances. 2025 , url=

  17. [23]

    Annual Conference on Neural Information Processing Systems , year=

    RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning , author=. Annual Conference on Neural Information Processing Systems , year=

  18. [24]

    2023 , eprint=

    Towards General Text Embeddings with Multi-stage Contrastive Learning , author=. 2023 , eprint=

  19. [25]

    and Le, Quoc V

    Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =

  20. [27]

    Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages=

    Prompt tuning as user inherent profile inference machine , author=. Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages=

  21. [29]

    Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages=

    Towards explainable temporal user profiling with LLMs , author=. Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization , pages=

  22. [30]

    Proceedings of the Nineteenth ACM Conference on Recommender Systems , pages=

    Evaluating podcast recommendations with profile-aware llm-as-a-judge , author=. Proceedings of the Nineteenth ACM Conference on Recommender Systems , pages=

  23. [31]

    2026 , eprint=

    ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation , author=. 2026 , eprint=

  24. [33]

    IEEE Transactions on evolutionary computation , volume=

    A survey on evolutionary computation approaches to feature selection , author=. IEEE Transactions on evolutionary computation , volume=. 2015 , publisher=

  25. [34]

    Machine learning , volume=

    Special issue on feature engineering editorial , author=. Machine learning , volume=. 2024 , publisher=

  26. [35]

    Transactions of the association for computational linguistics , volume=

    Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

  27. [37]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Large language models can learn temporal reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  28. [40]

    Companion Proceedings of the ACM Web Conference 2024 , pages=

    Curriculum learning: Theories, approaches, applications, tools, and future directions in the era of large language models , author=. Companion Proceedings of the ACM Web Conference 2024 , pages=

  29. [41]

    Advances in neural information processing systems , volume=

    Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

  30. [43]

    2025 , eprint=

    GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models , author=. 2025 , eprint=

  31. [44]

    2025 , eprint=

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

  32. [45]

    2026 , eprint=

    Kimi K2: Open Agentic Intelligence , author=. 2026 , eprint=

  33. [46]

    2026 , eprint=

    GLM-5: from Vibe Coding to Agentic Engineering , author=. 2026 , eprint=

  34. [47]

    Beyond Goldfish Memory: Long-Term Open-Domain Conversation , url =

    Xu, Jing and Szlam, Arthur and Weston, Jason. Beyond Goldfish Memory: Long-Term Open-Domain Conversation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.356

  35. [48]

    Evaluating Very Long-Term Conversational Memory of

    Maharana, Adyasha and Lee, Dong-Ho and Tulyakov, Sergey and Bansal, Mohit and Barbieri, Francesco and Fang, Yuwei. Evaluating Very Long-Term Conversational Memory of LLM Agents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.747

  36. [49]

    ArXiv , year=

    From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment , author=. ArXiv , year=

  37. [51]

    Yao, Shunyu and Shinn, Noah and Razavi, Pedram and Narasimhan, Karthik , journal=

  38. [53]

    IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=

    Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs , author=. IEEE Transactions on Systems, Man, and Cybernetics: Systems , volume=. 2014 , publisher=

  39. [55]

    G. A. Baker, A. Raut, S. Shaier, et al. Lost in the middle, and in-between: Enhancing language models' ability to reason over long contexts in multi-hop qa. arXiv preprint arXiv:2412.10079, 2024

  40. [56]

    O. X. C.-A. Center. Bustm: Oppo xiaobu dialogue short text matching dataset. https://github.com/xiaobu-coai/BUSTM, 2021

  41. [57]

    H. Chen, K. Lv, C. Hu, et al. Chineseecomqa: A scalable e-commerce concept evaluation benchmark for large language models, 2025. URL https://arxiv.org/abs/2502.20196

  42. [58]

    DeepSeek-AI, A. Liu, A. Mei, et al. Deepseek-v3.2: Pushing the frontier of open large language models, 2025. URL https://arxiv.org/abs/2512.02556

  43. [59]

    F. Duan, X. Huang and Z. Wei. Lifesim: Long-horizon user life simulator for personalized assistant evaluation. arXiv preprint arXiv:2603.12152, 2026

  44. [60]

    Fabbri, G

    F. Fabbri, G. Penha, E. D'Amico, et al. Evaluating podcast recommendations with profile-aware llm-as-a-judge. In Proceedings of the Nineteenth ACM Conference on Recommender Systems, pages 1181--1186, 2025

  45. [61]

    Fatemi, M

    B. Fatemi, M. Kazemi, A. Tsitsulin, et al. Test of time: A benchmark for evaluating llms on temporal reasoning. arXiv preprint arXiv:2406.09170, 2024

  46. [62]

    T. Ge, X. Chan, X. Wang, et al. Scaling synthetic data creation with 1,000,000,000 personas. arXiv preprint arXiv:2406.20094, 2024

  47. [63]

    GLM-5: from Vibe Coding to Agentic Engineering

    GLM-5-Team, :, A. Zeng, et al. Glm-5: from vibe coding to agentic engineering, 2026. URL https://arxiv.org/abs/2602.15763

  48. [64]

    D. Guo, D. Yang, H. Zhang, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645 0 (8081): 0 633–638, Sept. 2025. ISSN 1476-4687. doi:10.1038/s41586-025-09422-z. URL http://dx.doi.org/10.1038/s41586-025-09422-z

  49. [65]

    W. He, Y. Sun, H. Hao, et al. Vitabench: Benchmarking llm agents with versatile interactive tasks in real-world applications. arXiv preprint arXiv:2509.26490, 2025

  50. [66]

    Huang, Y

    Y. Huang, Y. Bai, Z. Zhu, et al. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. In Advances in Neural Information Processing Systems, 2023

  51. [67]

    Kimi K2.5: Visual Agentic Intelligence

    Kimi Team . Kimi K2.5: Visual Agentic Intelligence . Technical Report, Moonshot AI, January 2026. URL https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_report.pdf

  52. [68]

    J.-N. Li, J. Guan, S. Wu, et al. From 1,000,000 users to every user: Scaling up personalized preference for user-level alignment. ArXiv, abs/2503.15463, 2025 a . URL https://api.semanticscholar.org/CorpusID:277113478

  53. [69]

    Y. Li, J. Zhao, X. Ren, et al. Conf-profile: A confidence-driven reasoning paradigm for label-free user profiling. arXiv preprint arXiv:2509.18864, 2025 b

  54. [70]

    Z. Li, X. Zhang, Y. Zhang, et al. Towards general text embeddings with multi-stage contrastive learning, 2023

  55. [71]

    N. F. Liu, K. Lin, J. Hewitt, et al. Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics, 12: 0 157--173, 2024 a

  56. [72]

    Y. Liu, J. Liu, X. Shi, et al. Let's learn step by step: Enhancing in-context learning ability with curriculum learning. arXiv preprint arXiv:2402.10738, 2024 b

  57. [73]

    Y. Lu, Z. Du, X. Li, et al. Prompt tuning as user inherent profile inference machine. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 5898--5906, 2025

  58. [74]

    Nguyen, N

    D. Nguyen, N. A. Smith and C. Rose. Author age prediction from text using linear regression. In Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities, pages 115--123, 2011

  59. [75]

    Onikoyi, N

    B. Onikoyi, N. Nnamoko and I. Korkontzelos. Gender prediction with descriptive textual data using a machine learning approach. Natural Language Processing Journal, 4: 0 100018, 2023

  60. [76]

    Ouyang, J

    L. Ouyang, J. Wu, X. Jiang, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 0 27730--27744, 2022

  61. [77]

    B leu: a method for automatic evaluation of machine translation

    K. Papineni, S. Roukos, T. Ward, et al. B leu: a method for automatic evaluation of machine translation. In P. Isabelle, E. Charniak and D. Lin, editors, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311--318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi:10.3115...

  62. [78]

    N. J. Prottasha, M. Kowsher, H. Raman, et al. User profile with large language models: Construction, updating, and benchmarking. arXiv preprint arXiv:2502.10660, 2025

  63. [79]

    Qwen3.6-Plus : Towards real world agents, April 2026

    Qwen Team . Qwen3.6-Plus : Towards real world agents, April 2026. URL https://qwen.ai/blog?id=qwen3.6

  64. [80]

    Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He

    S. Rajbhandari, J. Rasley, O. Ruwase, et al. Zero: Memory optimization towards training A trillion parameter models. CoRR, abs/1910.02054, 2019. URL http://arxiv.org/abs/1910.02054

  65. [81]

    R. H. Rangnekar, K. P. Suratwala, S. Krishna, et al. Career prediction model using data mining and linear classification. In 2018 fourth international conference on computing communication control and automation (ICCUBEA), pages 1--6. IEEE, 2018

  66. [82]

    Sabouri, M

    M. Sabouri, M. Mansoury, K. Lin, et al. Towards explainable temporal user profiling with llms. In Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pages 219--227, 2025

  67. [83]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

  68. [84]

    Z. Shao, P. Wang, Q. Zhu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

  69. [85]

    Y. Shi, Y. Fei, S. Zhang, et al. You are what you bought: Generating customer personas for e-commerce applications. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810--1819, 2025 a

  70. [86]

    Y. Shi, Y. Fei, S. Zhang, et al. You are what you bought: Generating customer personas for e-commerce applications. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1810--1819, 2025 b

  71. [87]

    K. Sun, D. Yu, D. Yu, et al. Investigating prior knowledge for challenging chinese machine reading comprehension, 2019. URL https://arxiv.org/abs/1904.09679

  72. [88]

    A. A. A. Team, Y. Hu, X. Zhang, et al. Amap agentic planning technical report. arXiv preprint arXiv:2512.24957, 2025

  73. [89]

    Q. Team. Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388

  74. [90]

    J. Wang, K. Xiao, H. Zhao, et al. Productresearch: Training e-commerce deep research agents via multi-agent synthetic trajectory distillation, 2026. URL https://arxiv.org/abs/2602.23716

  75. [91]

    W. Wang, S. Xiong, G. Chen, et al. Reinforcement learning optimization for large-scale learning: An efficient and user-friendly scaling library. arXiv preprint arXiv:2506.06122, 2025

  76. [92]

    X. Wang, Y. Zhou, H. Chen, et al. Curriculum learning: Theories, approaches, applications, tools, and future directions in the era of large language models. In Companion Proceedings of the ACM Web Conference 2024, pages 1306--1310, 2024

  77. [93]

    J. Wei, X. Wang, D. Schuurmans, et al. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA, 2022. Curran Associates Inc. ISBN 9781713871088

  78. [94]

    Xiong, A

    S. Xiong, A. Payani, R. Kompella, et al. Large language models can learn temporal reasoning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10452--10470, 2024

  79. [95]

    H. Xuan, Y. Liu, B. Li, et al. Knowledge enhancement for contrastive multi-behavior recommendation. In Proceedings of the sixteenth ACM international conference on web search and data mining, pages 195--203, 2023

  80. [96]

    B. Yang, J. Gu, K. Liu, et al. Empowering general-purpose user representation with full-life cycle behavior modeling. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2908--2917, 2023

Showing first 80 references.