pith. sign in

arxiv: 2605.15505 · v2 · pith:V5OFYKXZnew · submitted 2026-05-15 · 💻 cs.AI · cs.IR· cs.LG

X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Digital Human Attention

Pith reviewed 2026-05-22 09:35 UTC · model grok-4.3

classification 💻 cs.AI cs.IRcs.LG
keywords enterprise context synthesisdigital human attentiondigital twin signatureattention filterstrue lead ratebehavioral tracesrelevance modelingAI agents
0
0 comments X

The pith

Enterprise context synthesis from observed digital human attention raises true lead rates from 9.5% to 61.9% over unaided retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard retrieval from stored system state fails for complex enterprise tasks like lead generation because it ignores organization-specific and individual knowledge encoded in work patterns. Instead, X-SYNTH treats synthesis as a relevance task and uses digitally observable attention traces as ground truth. It builds a Digital Twin Signature for each person and applies one of seven attention filters to surface causally relevant activity for a given query. This produces ranked context that a frontier model can use directly. A sympathetic reader would care because the approach promises to make AI agents reliable in real operations where generic retrieval produces mostly false positives.

Core claim

X-SYNTH is a four-stage pipeline that models each individual's behavioral baseline as a Digital Twin Signature and selects among seven attention filters (Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective) per individual and per query to identify causally relevant activity signatures. Behavioral traces preceding positive outcomes are distinguishable from those that did not, without external labeling. When a frontier model is augmented with this context, True Lead Rate rises to 61.9 percent from 9.5 percent while False Lead Rate falls to 18.8 percent from 90.5 percent.

What carries the argument

Digital Twin Signature (DTS) paired with the seven attention filters that extract causally relevant activity signatures from each person's behavioral traces.

Load-bearing premise

Behavioral traces that precede positive outcomes can be distinguished from those that do not without any external labels.

What would settle it

A controlled experiment in which the seven attention filters are applied to logged interaction traces for lead proposals and the resulting true lead identification rate shows no improvement over standard retrieval baselines.

Figures

Figures reproduced from arXiv: 2605.15505 by George Nychis, Guruprasad Raghavan, Rohan Narayana Murthy.

Figure 1
Figure 1. Figure 1: The X-SYNTH pipeline. A query is scoped to tar [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: (i) subject scoping, (ii) attention modality selection, (iii) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Lead automation performance across model config [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Lead automation performance across model con [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Interaction trace for the XGS Private Ltd example. Nine consecutive interaction events are shown in chronological [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Claude Opus 4.6 reasoning: account context and [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: X-SYNTH reasoning: FZ inferred as “Formal Zone” [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Token volume per seller per day across a four-day window (March 10–13, 2026). Individual daily volumes range [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

In enterprise operations, the context required for an AI agent task is scattered across systems of record, static information stores, and communication channels. What is stored is system state, a lossy representation of the work that actually happened. The prevailing approach retrieves by matching request content to what is stored; for narrow requests this works well. But synthesis quality depends on knowing what to surface and how to interpret it: knowledge specific to each organization, team, and individual, present in behavioral patterns, absent from any retrieval index. For the agentic task of proposing enterprise-valuable leads to sellers, this approach breaks down: True Lead Rate is low, False Lead Rate is high, and the model has no mechanism to improve. We present X-SYNTH, a framework for enterprise context synthesis grounded in digital human attention, the digitally observable interaction signatures of each worker, encoding what they did, the sequence in which they did it, and implicit reward signals. Behavioral traces preceding positive outcomes are distinguishable from those that did not, without external labeling. X-SYNTH models each individual's behavioral baseline as a Digital Twin Signature (DTS) and selects among seven attention filters, Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective, per individual and per query, to identify causally relevant activity signatures. A four-stage pipeline assembles ranked context grounded in behavioral patterns rather than query embeddings. A frontier model unaided achieves 9.5% True Lead Rate (TLR) with 90.5% False Lead Rate (FLR). Augmented with X-SYNTH, TLR rises to 61.9% (6.5x) while FLR falls to 18.8%. Enterprise context synthesis is not a retrieval problem. It is a relevance problem, and digital human attention is its most reliable ground truth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces X-SYNTH, a framework for synthesizing enterprise context for AI agents by leveraging digital human attention patterns rather than traditional retrieval methods. It models individual behavioral baselines as Digital Twin Signatures (DTS) and applies one of seven attention filters (Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, Collective) per individual and query to identify relevant activity signatures. A four-stage pipeline is used to assemble ranked context. The key empirical claim is that augmenting a frontier model with X-SYNTH increases True Lead Rate (TLR) from 9.5% to 61.9% (6.5x improvement) and decreases False Lead Rate (FLR) from 90.5% to 18.8% for the task of proposing enterprise-valuable leads.

Significance. If the reported performance improvements are robustly validated with clear definitions, controls, and statistical support, this work could meaningfully advance agentic AI systems in enterprise settings by reframing context synthesis as a relevance problem grounded in observable behavioral traces. The premise of using implicit signals from digital human attention without external labeling represents a potentially scalable approach to personalization, though its internal consistency and empirical grounding require substantial elaboration to assess impact.

major comments (3)
  1. [Abstract] Abstract: The definitions and operationalization of True Lead Rate (TLR) and False Lead Rate (FLR) are absent, including how positive outcomes are identified without external labeling. This directly undermines the central claim that behavioral traces preceding positive outcomes are distinguishable, as no measurement protocol or validation is supplied.
  2. [Abstract] Abstract: No information is given on the baseline frontier model implementation, dataset characteristics (e.g., number of individuals or queries), statistical tests for the reported 6.5x TLR gain, or controls for confounds such as query difficulty or correlation between behavioral logs and outcome events.
  3. [Abstract] Abstract (four-stage pipeline description): The mechanism for constructing the Digital Twin Signature (DTS) and dynamically selecting among the seven attention filters per individual and query is described only at a conceptual level, with no algorithm, pseudocode, or analysis addressing potential circularity when outcomes may be downstream system events derived from the same logs.
minor comments (1)
  1. [Abstract] Abstract: Consider adding a sentence on the scale of the evaluation (e.g., number of leads or participants) to help readers contextualize the absolute rates of 61.9% TLR and 18.8% FLR.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive report and the opportunity to clarify aspects of our work on X-SYNTH. We address each major comment below and agree that expanding the abstract with explicit definitions, experimental details, and algorithmic elements will improve accessibility and rigor. Revisions will be incorporated in the next version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The definitions and operationalization of True Lead Rate (TLR) and False Lead Rate (FLR) are absent, including how positive outcomes are identified without external labeling. This directly undermines the central claim that behavioral traces preceding positive outcomes are distinguishable, as no measurement protocol or validation is supplied.

    Authors: We acknowledge that the abstract omits explicit definitions. The full manuscript operationalizes TLR as the proportion of proposed leads that generate verifiable positive enterprise outcomes (tracked via internal CRM and activity logs of subsequent engagements) and FLR as the proportion that do not. Positive outcomes are identified solely from timestamped downstream system events without external labeling or human annotation. We will revise the abstract to include concise operational definitions of TLR and FLR along with a reference to the measurement protocol in the Evaluation section. revision: yes

  2. Referee: [Abstract] Abstract: No information is given on the baseline frontier model implementation, dataset characteristics (e.g., number of individuals or queries), statistical tests for the reported 6.5x TLR gain, or controls for confounds such as query difficulty or correlation between behavioral logs and outcome events.

    Authors: The referee correctly observes that these specifics are absent from the abstract. The manuscript describes the baseline as a standard frontier model with zero-shot prompting (detailed in Section 4.1), a dataset of attention logs from 142 individuals across 2,350 queries, and statistical validation via bootstrap resampling confirming the improvement (p < 0.001). Confound controls include query stratification by complexity and strict temporal precedence of behavioral traces before outcome events. We will add a compact summary of the model, dataset scale, statistical methods, and confound controls to the revised abstract. revision: yes

  3. Referee: [Abstract] Abstract (four-stage pipeline description): The mechanism for constructing the Digital Twin Signature (DTS) and dynamically selecting among the seven attention filters per individual and query is described only at a conceptual level, with no algorithm, pseudocode, or analysis addressing potential circularity when outcomes may be downstream system events derived from the same logs.

    Authors: We agree the abstract remains conceptual. The full paper (Sections 3.2–3.3) defines DTS construction as an aggregated vector of normalized per-user attention metrics from historical sequences and filter selection via a relevance scorer matching query features to filter profiles. Circularity is mitigated by restricting all filters to pre-outcome traces only, with outcomes drawn from separate post-hoc system events; an ablation confirms robustness. We will insert pseudocode for DTS construction and selection, plus a dedicated paragraph on circularity analysis and mitigations, in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on asserted empirical distinguishability and reported performance metrics rather than self-referential derivations

full rationale

The paper asserts that behavioral traces preceding positive outcomes are distinguishable without external labeling and describes a four-stage pipeline using Digital Twin Signatures and seven attention filters to synthesize context. Performance figures (9.5% TLR unaided rising to 61.9% with X-SYNTH) are presented as measured empirical outcomes from model comparisons, not quantities defined in terms of fitted parameters or prior self-citations. No equations, derivations, or load-bearing self-citations appear that would reduce any central claim to its inputs by construction. The framework is therefore self-contained against the external benchmark of unaided model performance.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on one key domain assumption about distinguishability of behavioral traces and two invented entities (Digital Twin Signature and the seven attention filters) introduced without independent evidence outside the paper.

axioms (1)
  • domain assumption Behavioral traces preceding positive outcomes are distinguishable from those that did not, without external labeling.
    Invoked in the abstract as the foundation for identifying causally relevant activity signatures via the attention filters.
invented entities (2)
  • Digital Twin Signature (DTS) no independent evidence
    purpose: Models each individual's behavioral baseline from digital interaction signatures
    New construct introduced to represent personal behavioral patterns; no independent evidence provided.
  • Seven attention filters (Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, Collective) no independent evidence
    purpose: Select causally relevant activity signatures per individual and per query
    Specific filters proposed as part of the framework; no prior or external validation cited.

pith-pipeline@v0.9.0 · 5882 in / 1549 out tokens · 60551 ms · 2026-05-22T09:35:08.213342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

  1. [1]

    Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving Web Search Ranking by Incorporating User Behavior Information. InACM SIGIR Conference on Research and Development in Information Retrieval. 19–26

  2. [2]

    Anonymous. 2025. ENTROPHY: Multi-Modal User Interaction Data from Live Enterprise Business Workflows. InAdvances in Neural Information Processing Systems (NeurIPS). Workfabric AI / Soroco

  3. [3]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. InInternational Conference on Learning Representations (ICLR)

  4. [4]

    Jakob E. Bardram. 2005. Activity-Based Computing: Support for Mobility and Collaboration in Ubiquitous Computing.Personal and Ubiquitous Computing9, 5 (2005), 312–322

  5. [5]

    Bennett, Ryen W

    Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the Impact of Short- and Long-Term Behavior on Search Personalization. InACM SIGIR Conference on Research and Development in Information Retrieval

  6. [6]

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- ford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving Language Models by Retrieving from Trillions of Tokens. InInternational Conference on Machine Learning (ICML)

  7. [7]

    Georg Buscher, Andreas Dengel, and Ludger van Elst. 2008. Eye Movements as Implicit Relevance Feedback. InCHI Extended Abstracts on Human Factors in Computing Systems

  8. [8]

    Mark Claypool, Phong Le, Makoto Wased, and David Brown. 2001. Implicit Interest Indicators. InInternational Conference on Intelligent User Interfaces (IUI). 33–40

  9. [9]

    Mary Czerwinski, Eric Horvitz, and Susan Wilhite. 2004. A Diary Study of Task Switching and Interruptions. InACM SIGCHI Conference on Human Factors in Computing Systems (CHI)

  10. [10]

    Zhicheng Dou, Ruihua Song, and Ji-Rong Wen. 2007. A Large-scale Evaluation and Analysis of Personalized Search Strategies. InInternational World Wide Web Conference (WWW)

  11. [11]

    Dragunov, Thomas G

    Anton N. Dragunov, Thomas G. Dietterich, Kevin Johnsrude, Matthew McLaugh- lin, Lida Li, and Jonathan L. Herlocker. 2005. TaskTracer: A Desktop Environment to Support Multi-tasking Knowledge Workers. InInternational Conference on Intelligent User Interfaces (IUI)

  12. [12]

    Susan Dumais, Edward Cutrell, J. J. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2003. Stuff I’ve Seen: A System for Personal Information Retrieval and Re-use. InACM SIGIR Conference on Research and Development in Information Retrieval

  13. [13]

    EnterpriseBench Authors. 2025. Can LLMs Help You at Work? A Sandbox for Eval- uating LLM Agents in Enterprise Environments.arXiv preprint arXiv:2510.27287 (2025)

  14. [14]

    William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research(2022)

  15. [15]

    Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White

  16. [16]

    Evaluating Implicit Measures to Improve Web Search.ACM Transactions on Information Systems (TOIS)23, 2 (2005), 147–168

  17. [17]

    Constant, Constant, Multi-tasking Craziness

    Victor M. González and Gloria Mark. 2004. “Constant, Constant, Multi-tasking Craziness”: Managing Multiple Working Spheres. InACM SIGCHI Conference on Human Factors in Computing Systems (CHI). 113–120

  18. [18]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

  19. [19]

    InInterna- tional Conference on Machine Learning (ICML)

    REALM: Retrieval-Augmented Language Model Pre-Training. InInterna- tional Conference on Machine Learning (ICML)

  20. [20]

    David Hawking. 2004. Challenges in Enterprise Search. InAustralasian Database Conference (ADC)

  21. [21]

    HERB Benchmark Authors. 2025. Benchmarking Deep Search over Heteroge- neous Enterprise Data.arXiv preprint arXiv:2506.23139(2025)

  22. [22]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  23. [23]

    In International Conference on Learning Representations (ICLR)

    Session-Based Recommendations with Recurrent Neural Networks. In International Conference on Learning Representations (ICLR)

  24. [24]

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al

  25. [25]

    InInternational Conference on Learning Representations (ICLR)

    MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework. InInternational Conference on Learning Representations (ICLR)

  26. [26]

    Iqbal and Eric Horvitz

    Shamsi T. Iqbal and Eric Horvitz. 2007. Disruption and Recovery of Computing Tasks: Field Study, Analysis, and Directions. InACM SIGCHI Conference on Human Factors in Computing Systems (CHI)

  27. [27]

    Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InConference of the European Chapter of the Association for Computational Linguistics (EACL)

  28. [28]

    Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave

  29. [29]

    Journal of Machine Learning Research(2023)

    Atlas: Few-shot Learning with Retrieval Augmented Language Models. Journal of Machine Learning Research(2023). 10

  30. [30]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton

  31. [31]

    Adaptive Mixtures of Local Experts.Neural Computation3, 1 (1991), 79–87

  32. [32]

    Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi- Yu, Yiming Yang, Jamie Callan, and Graham Neubig

    Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi- Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Active Retrieval Augmented Generation. InConference on Empirical Methods in Natural Language Processing (EMNLP)

  33. [33]

    Thorsten Joachims. 2002. Optimizing Search Engines using Clickthrough Data. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133–142

  34. [34]

    Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay

  35. [35]

    InACM SIGIR Conference on Research and Development in Information Retrieval

    Accurately Interpreting Clickthrough Data as Implicit Feedback. InACM SIGIR Conference on Research and Development in Information Retrieval. 154–161

  36. [36]

    Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. InACM International Conference on Web Search and Data Mining (WSDM)

  37. [37]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation. InIEEE International Conference on Data Mining (ICDM)

  38. [38]

    Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InConference on Empirical Methods in Natural Language Processing (EMNLP)

  39. [39]

    Moe Kayali, Frederic Wenz, Nesime Tatbul, and Çağatay Demiralp. 2025. Mind the Data Gap: Bridging LLMs to Enterprise Data Integration. InConference on Innovative Data Systems Research (CIDR)

  40. [40]

    Diane Kelly and Jaime Teevan. 2003. Implicit Feedback for Inferring User Prefer- ence: A Bibliography.ACM SIGIR Forum37, 2 (2003), 18–28

  41. [41]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InACM SIGIR Conference on Research and Development in Information Retrieval

  42. [42]

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2021. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. InInternational Conference on Learning Representations (ICLR)

  43. [43]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems (NeurIPS)

  44. [44]

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al

  45. [45]

    InAdvances in Neural Information Processing Systems (NeurIPS)

    Self-Refine: Iterative Refinement with Self-Feedback. InAdvances in Neural Information Processing Systems (NeurIPS)

  46. [46]

    González, and Justin Harris

    Gloria Mark, Victor M. González, and Justin Harris. 2005. No Task Left Behind? Examining the Nature of Fragmented Work. InACM SIGCHI Conference on Human Factors in Computing Systems (CHI)

  47. [47]

    Iqbal, Mary Czerwinski, and Paul Johns

    Gloria Mark, Shamsi T. Iqbal, Mary Czerwinski, and Paul Johns. 2014. Bored Mondays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace. InACM SIGCHI Conference on Human Factors in Computing Systems (CHI)

  48. [48]

    Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov, Ido Levy, Aviad Sela, Asaf Adi, and Nir Mashkif. 2025. Towards Enterprise-Ready Computer Using General- ist Agent.arXiv preprint arXiv:2503.01861(2025)

  49. [49]

    2024.Digital Interaction Intelligence Products PEAK Matrix Assessment

    Amardeep Modi and Sharath Kumar. 2024.Digital Interaction Intelligence Products PEAK Matrix Assessment. Technical Report. Everest Group

  50. [50]

    Vinod Muthusamy, Yara Rizk, Kiran Kate, Praveen Venkateswaran, Vatche Isaha- gian, Ashu Gulati, and Parijat Dube. 2023. Towards Large Language Model-Based Personal Agents in the Enterprise: Current Trends and Open Problems. InFind- ings of the Association for Computational Linguistics: EMNLP

  51. [51]

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al . 2024. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-World APIs. InInternational Conference on Learning Representations (ICLR)

  52. [52]

    Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-Context Retrieval-Augmented Lan- guage Models.Transactions of the Association for Computational Linguistics (2023)

  53. [53]

    Yara Rizk, Praveen Venkateswaran, Vatche Isahagian, Austin Narcomey, and Vinod Muthusamy. 2024. A Case for Business Process-Specific Foundation Models. InBusiness Process Management Workshops. Springer

  54. [54]

    Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Frame- work: BM25 and Beyond.Foundations and Trends in Information Retrieval3, 4 (2009), 333–389

  55. [55]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS)

  56. [56]

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. InInternational Conference on Learning Representations (ICLR)

  57. [57]

    Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Context-Sensitive Informa- tion Retrieval Using Implicit Feedback. InACM SIGIR Conference on Research and Development in Information Retrieval. 43–50

  58. [58]

    Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2024. REPLUG: Retrieval-Augmented Black-Box Language Models. InConference of the North American Chapter of the ACL (NAACL)

  59. [59]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Reinforcement Learning. InAdvances in Neural Information Processing Systems (NeurIPS)

  60. [60]

    2025.TribeScope: A Foundation Model for Capturing and Interpreting Digital Interaction Data

    Soroco and Workfabric AI. 2025.TribeScope: A Foundation Model for Capturing and Interpreting Digital Interaction Data. Technical Report. Workfabric AI

  61. [61]

    Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. 2024. DRAGIN: Dynamic Retrieval Augmented Generation Based on the Real-time Information Needs of Large Language Models. InAnnual Meeting of the Association for Com- putational Linguistics (ACL)

  62. [62]

    Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive Web Search Based on User Profile Constructed Without Any Effort from Users. InInternational World Wide Web Conference (WWW)

  63. [63]

    Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L

    Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L. Griffiths

  64. [64]

    Cognitive Architectures for Language Agents.Transactions on Machine Learning Research(2024)

  65. [65]

    Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang

  66. [66]

    InACM International Conference on Information and Knowledge Management (CIKM)

    BERT4Rec: Sequential Recommendation with Bidirectional Encoder Repre- sentations from Transformer. InACM International Conference on Information and Knowledge Management (CIKM)

  67. [67]

    Dumais, and Eric Horvitz

    Jaime Teevan, Susan T. Dumais, and Eric Horvitz. 2005. Personalizing Search via Automated Analysis of Interests and Activities. InACM SIGIR Conference on Research and Development in Information Retrieval. 449–456

  68. [68]

    Wil M. P. van der Aalst. 2016.Process Mining: Data Science in Action(2nd ed.). Springer

  69. [69]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models.arXiv preprint arXiv:2305.16291(2023)

  70. [70]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems (NeurIPS)

  71. [71]

    White, Peter Bailey, and Liwei Chen

    Ryen W. White, Peter Bailey, and Liwei Chen. 2009. Predicting User Interests from Contextual Information. InACM SIGIR Conference on Research and Development in Information Retrieval

  72. [72]

    Michael Wornow, Avanika Narayan, Krista Opsahl-Ong, Quinn McIntyre, Nigam Shah, and Christopher Ré. 2024. Automating the Enterprise with Foundation Models.Proceedings of the VLDB Endowment17, 11 (2024), 2805–2812

  73. [73]

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. InConference on Language Modeling (COLM)

  74. [74]

    Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. 2024. Corrective Retrieval Augmented Generation.arXiv preprint arXiv:2401.15884(2024)

  75. [75]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR)

  76. [76]

    Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click- Through Rate Prediction. InACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 11