ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage
Pith reviewed 2026-05-20 22:28 UTC · model grok-4.3
The pith
ORACLE is an agentic framework that anticipates scams from partial streaming app-usage trajectories by consolidating cross-temporal evidence and distilling anti-scam knowledge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ORACLE is the first agentic framework for early scam anticipation from streaming app-usage trajectories. A self-evolving context manager adaptively consolidates entity-centric interactions over time to reconstruct cross-temporal evidence from partial observations. An on-policy self-distillation scheme lets a teacher model conditioned on summarized anti-scam reflections and clues supervise a student model that lacks those reflections, thereby distilling evidence-informed knowledge that improves recognition of emerging fraud patterns from incomplete trajectories.
What carries the argument
A self-evolving context manager that adaptively consolidates entity-centric interactions over time together with an on-policy self-distillation scheme in which a teacher model equipped with anti-scam reflections supervises a student model without them.
If this is right
- Timely warnings become feasible before scam intent is explicit in realistic streaming conditions.
- False alerts decrease while still covering twelve scam types that unfold over fifteen-day average horizons.
- Fragmented evidence across multiple apps can be reconstructed into usable cross-temporal signals.
- Distilled knowledge from summarized reflections improves sensitivity to latent early-stage fraud patterns.
Where Pith is reading between the lines
- The same consolidation and distillation pattern could be tested on other gradual-intent sequences such as security log analysis or user-behavior prediction.
- Long-horizon datasets that deliberately interleave benign and malicious actions appear necessary for training detectors that must act on partial histories.
- Adding live user feedback loops to the self-distillation process might allow the framework to adapt faster to new scam variants in production.
Load-bearing premise
The curated real-world long-horizon benchmark of streaming app-usage trajectories accurately represents diverse scam behaviors interleaved with normal use across twelve scam types, ninety-five apps, and extended periods.
What would settle it
Running ORACLE on a fresh collection of streaming app-usage trajectories and finding no earlier detection or lower false-alert rate than standard sequence models would falsify the central performance claim.
Figures
read the original abstract
Smartphone scams are increasingly prevalent and typically manifest as multi-stage, cross-application processes with gradually emerging intent. Effective intervention thus requires anticipating scams before the intent becomes explicit. This is inherently challenging, as decisions must rely on partial trajectories with temporally distributed evidence. In this paper, we propose \textbf{ORACLE} Online Reasoning for Anticipating Cross-temporal Latent thrEats, the first agentic framework for early scam anticipation from \textit{streaming app-usage} trajectories. To support this setting, we curate a real-world long-horizon benchmark of streaming app-usage trajectories, covering 12 scam types, spanning extended periods (15 days on average), involving diverse applications (95 apps), and interleaving normal and scam behaviors. To address fragmented evidence, we introduce a self-evolving context manager that adaptively consolidates entity-centric interactions over time, enabling more effective reconstruction of cross-temporal evidence from partial observations. To enhance sensitivity to latent early-stage signals, we propose an on-policy self-distillation scheme in which a teacher model, conditioned on summarized anti-scam reflections and clues by skills, supervises a student model without access to such reflections. This scheme thereby distills evidence-informed knowledge and improves recognition of emerging fraud patterns from partial trajectories. Experiments show that \method{} consistently improves early scam anticipation, yielding timely warnings while reducing false alerts in realistic streaming scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ORACLE, an agentic framework for early scam anticipation from partial streaming app-usage trajectories. It curates a real-world long-horizon benchmark covering 12 scam types, 95 apps, and ~15-day spans with interleaved normal and scam behaviors. The core technical contributions are a self-evolving context manager that consolidates entity-centric interactions over time and an on-policy self-distillation scheme in which a teacher conditioned on summarized anti-scam reflections supervises a student model lacking those reflections. Experiments are reported to show consistent gains in early anticipation, timely warnings, and reduced false alerts under realistic streaming conditions.
Significance. If the benchmark labels are free of hindsight bias and the reported gains are statistically robust, the work would represent a meaningful advance in proactive, partial-observation fraud detection. The self-distillation approach from full-context reflections to streaming prefixes is a concrete mechanism for handling temporally distributed evidence and could transfer to other latent-intent tasks. The curated multi-app, multi-week dataset itself would be a useful community resource provided its construction is transparently validated.
major comments (2)
- [Section 3 / experimental setup] Benchmark curation and labeling (Section 3 / experimental setup): the description does not specify whether scam-onset labels for partial trajectories were produced via blinded review, inter-annotator agreement, or a hold-out real-time collection protocol. If labels were assigned after full-sequence inspection or external reports, the measured improvements in timeliness and false-alarm reduction could be artifacts of post-hoc knowledge rather than genuine cross-temporal reasoning from prefixes alone. This directly affects the validity of the central experimental claim.
- [Section 5] Experimental reporting (Section 5): the abstract and results summary claim consistent improvements but provide no quantitative metrics, baseline definitions, statistical significance tests, or ablation isolating the context manager versus the distillation component. Without these, it is impossible to assess whether the headline gains are load-bearing or merely incremental.
minor comments (2)
- [Section 4] Notation for the self-evolving context manager and the on-policy distillation loss should be formalized with explicit equations rather than prose descriptions only.
- [Section 5] The paper should clarify the precise definition of 'early' anticipation (e.g., number of steps or time before explicit scam action) and report per-scam-type breakdowns to support the cross-type claim.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and transparency.
read point-by-point responses
-
Referee: [Section 3 / experimental setup] Benchmark curation and labeling (Section 3 / experimental setup): the description does not specify whether scam-onset labels for partial trajectories were produced via blinded review, inter-annotator agreement, or a hold-out real-time collection protocol. If labels were assigned after full-sequence inspection or external reports, the measured improvements in timeliness and false-alarm reduction could be artifacts of post-hoc knowledge rather than genuine cross-temporal reasoning from prefixes alone. This directly affects the validity of the central experimental claim.
Authors: We agree that the labeling protocol for scam-onset in partial trajectories is essential to substantiate the validity of our central claims and to rule out hindsight bias. The current description in Section 3 outlines the benchmark curation at a high level but does not specify the annotation procedure. We will revise Section 3 to provide a complete account of the labeling process, including details on blinded review, inter-annotator agreement, and the hold-out real-time collection protocol used. This addition will clarify that labels reflect genuine cross-temporal reasoning from prefixes alone. revision: yes
-
Referee: [Section 5] Experimental reporting (Section 5): the abstract and results summary claim consistent improvements but provide no quantitative metrics, baseline definitions, statistical significance tests, or ablation isolating the context manager versus the distillation component. Without these, it is impossible to assess whether the headline gains are load-bearing or merely incremental.
Authors: We appreciate this observation on the need for more rigorous experimental reporting. While Section 5 presents the full experimental results, the abstract and high-level results summary do not include the requested quantitative details. We will revise the abstract and Section 5 to explicitly report quantitative metrics, define all baselines, include statistical significance tests, and add an ablation study that isolates the contributions of the self-evolving context manager and the on-policy self-distillation scheme. These changes will enable a clearer assessment of the gains. revision: yes
Circularity Check
No circularity: framework relies on independent architectural proposals and curated benchmark
full rationale
The paper presents ORACLE as an agentic framework introducing a self-evolving context manager and on-policy self-distillation for early scam anticipation from streaming trajectories. No equations, fitted parameters, or derivation chains are described that reduce by construction to their own inputs or outputs. The benchmark curation and experimental claims rest on external data collection rather than self-referential definitions or self-citation load-bearing premises. The approach is self-contained with independent content in its proposed components and empirical evaluation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sharad Agarwal, Guillermo Suarez-Tangil, and Marie Vasek. An overview of 7726 user reports: uncovering sms scams and scammer strategies.arXiv preprint arXiv:2508.05276, 2025
-
[2]
Milad Taleby Ahvanooey, Qianmu Li, Mahdi Rabbani, and Ahmed Raza Rajput. A survey on smartphones security: Software vulnerabilities, malware, and attacks.International Journal of Advanced Computer Science and Applications, 8(10), 2017
work page 2017
-
[3]
Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. Context-aware target apps selection and recommendation for enhancing personal mobile assistants. InACM Transactions on Information Systems (TOIS), 2021
work page 2021
-
[4]
Claude sonnet 4.https://www.anthropic.com/claude/sonnet, May 2025
Anthropic. Claude sonnet 4.https://www.anthropic.com/claude/sonnet, May 2025
work page 2025
-
[5]
Anthropic. Introducing claude opus 4.1. https://www.anthropic.com/news/ claude-opus-4-1, April 2025
work page 2025
-
[6]
Bot wars evolved: Orchestrating competing llms in a counterstrike against phone scams
Nardine Basta, Conor Atkins, and Dali Kaafar. Bot wars evolved: Orchestrating competing llms in a counterstrike against phone scams. InPacific-Asia Conference on Knowledge Discovery and Data Mining, pages 338–350. Springer, 2025
work page 2025
-
[7]
Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuan- jun Lv, Jinzheng He, Junyang Lin, et al. Qwen2-audio technical report.arXiv preprint arXiv:2407.10759, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Deepseek-v4-pro: Efficient 1m-token context language model, 2026
DeepSeek-AI. Deepseek-v4-pro: Efficient 1m-token context language model, 2026
work page 2026
-
[9]
Brandon Dulisse, Chivon Fitch, and Nathan Connealy. The scammer’s playbook: Explor- ing the psychological techniques and tactics used by scammers in the social engineering of cryptocurrency fraud.Journal of Economic Criminology, 11:100211, 2026
work page 2026
-
[10]
A new era of intelligence with gemini 3
Google. A new era of intelligence with gemini 3. https://blog.google/ products-and-platforms/products/gemini/gemini-3/, November 2025
work page 2025
-
[11]
CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments
Nitish Jaipuria, Lorenzo Gatto, Zijun Kan, Shankey Poddar, Bill Cheung, Diksha Bansal, Ramanan Balakrishnan, Aviral Suri, and Jose Estevez. Case: An agentic ai framework for enhancing scam intelligence in digital payments.arXiv preprint arXiv:2508.19932, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Xskill: Continual learning from experience and skills in multimodal agents,
Guanyu Jiang, Zhaochen Su, Xiaoye Qu, and Yi R Fung. Xskill: Continual learning from experience and skills in multimodal agents.arXiv preprint arXiv:2603.12056, 2026
-
[13]
Danyang Li, Ruilin Zheng, Xiao Fan Liu, Baojun Ma, Haowen Sun, and Li Crystal Jiang. Linguistic dynamics of online scam conversations: a multi-stage analysis based on the cold framework.Humanities and Social Sciences Communications, 2026
work page 2026
-
[14]
Lei Lin, Qian Wang, and Adel W Sadek. A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction.Transportation Research Part C: Emerging Technologies, 55:444–459, 2015
work page 2015
-
[15]
Teleantifraud-28k: An audio-text slow- thinking dataset for telecom fraud detection
Zhiming Ma, Peidong Wang, Minhua Huang, Jinpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, and Yuchen Kang. Teleantifraud-28k: An audio-text slow- thinking dataset for telecom fraud detection. InProceedings of the 33rd ACM International Conference on Multimedia, pages 5853–5862, 2025
work page 2025
-
[16]
OpenAI. Introducing gpt-5. https://openai.com/index/introducing-gpt-5/, August 2025
work page 2025
-
[17]
it warned me just at the right moment
Zitong Shen, Sineng Yan, Youqian Zhang, Xiapu Luo, Grace Ngai, and Eugene Yujun Fu. " it warned me just at the right moment": Exploring llm-based real-time detection of phone scams. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, pages 1–7, 2025. 11
work page 2025
-
[18]
Overview of CCL23-eval task 6: Telecom network fraud case classification
Chengjie Sun, Jie Ji, Boyue Shang, and Binguan Liu. Overview of CCL23-eval task 6: Telecom network fraud case classification. InProceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations), pages 193–200, Harbin, China, August
-
[19]
Chinese Information Processing Society of China
-
[20]
Xue Wen Tan, Kenneth See, and Stanley Kok. Scamgpt-j: Inside the scammer’s mind, a genera- tive ai-based approach toward combating messaging scams.arXiv preprint arXiv:2412.13528, 2024
-
[21]
Xue Wen Tan, Kenneth See, and Stanley Kok. Anticipate, simulate, reason (asr): A comprehen- sive generative ai framework for combating messaging scams.arXiv preprint arXiv:2507.17543, 2025
-
[22]
Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2. 5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Peidong Wang, Zhiming Ma, Xin Dai, Yongkang Liu, Shi Feng, Xiaocui Yang, Wenxing Hu, Zhihao Wang, Mingjun Pan, Li Yuan, et al. Safe-qaq: End-to-end slow-thinking audio-text fraud detection via reinforcement learning.arXiv e-prints, pages arXiv–2601, 2026
work page 2026
-
[24]
Grok 4.https://x.ai/news/grok-4, July 2025
xAI. Grok 4.https://x.ai/news/grok-4, July 2025
work page 2025
-
[25]
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, et al. Skillrl: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[26]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Shu Yang, Shenzhe Zhu, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F Wong, and Di Wang. Fraud-r1: A multi-round benchmark for assessing the robustness of llm against augmented fraud and phishing inducements. InFindings of the Association for Computational Linguistics: ACL 2025, pages 4374–4420, 2025
work page 2025
-
[28]
Online experiential learning for language models.arXiv preprint arXiv:2603.16856, 2026
Tianzhu Ye, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, and Furu Wei. Online experiential learning for language models.arXiv preprint arXiv:2603.16856, 2026
-
[29]
On-Policy Context Distillation for Language Models
Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, and Furu Wei. On-policy context distillation for language models.arXiv preprint arXiv:2602.12275, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[30]
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, et al. Glm-4.5: Agentic, reasoning, and coding (arc) foundation models.arXiv preprint arXiv:2508.06471, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. Self-distilled reasoner: On-policy self-distillation for large language models.arXiv preprint arXiv:2601.18734, 2026. 12 A Broader Impacts This work aims to support earlier and more reliable scam intervention from streaming app-usage tra- jectories. Compared with e...
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.