arxiv: 2512.06721 · v2 · submitted 2025-12-07 · 💻 cs.AI · cs.CL· cs.HC

Recognition: no theorem link

ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild

Bufang Yang , Lilin Xu , Liekang Zeng , Yunqi Guo , Siyang Jiang , Wenrui Lu , Kaiwei Liu , Yixuan Li

show 3 more authors

Xiaofan Jiang Guoliang Xing Zhenyu Yan

Authors on Pith no claims yet

Pith reviewed 2026-05-17 00:34 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.HC

keywords proactive LLM agentson-demand sensingtiered perceptioncontext extractionAR glassesuser assistancereal-world deployment

0 comments

The pith

ProAgent lets proactive LLM agents assist users in daily life by turning on detailed sensors only when low-cost cues indicate a need.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that continuous real-world assistance from LLM agents is possible without the high energy and computation costs of always-on sensing. It does this by starting with cheap contextual signals to decide when richer perception is worth activating, then builds hierarchical contexts that mix sensor data with user preferences. A dedicated reasoner uses these contexts to predict what the user might want next and calls tools to act. Tests on AR glasses with both public and real-world data show clear gains over prior methods in prediction accuracy and reduced false triggers. A study with twenty participants found most were willing to use the system in everyday settings.

Core claim

ProAgent is an end-to-end system that combines on-demand tiered perception, proactive-oriented context extraction, and a context-aware proactive reasoner to deliver continuous in-the-wild assistance while keeping system overhead low.

What carries the argument

on-demand tiered perception that pairs low-cost contextual cues with selective activation of richer sensory processing

If this is right

Higher accuracy in predicting when a user needs help compared with always-on or fixed-context baselines.
Fewer unnecessary activations that would drain resources or annoy the user.
Support for continuous daily assistance rather than short task-specific episodes.
Practical deployment on wearable hardware such as AR glasses without prohibitive overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tiered approach could extend to other battery-limited devices like phones or earbuds where constant sensing is costly.
User preference modeling inside the context extractor may need ongoing updates as habits change over weeks or months.
If the low-cost cues prove too coarse in noisy environments, the system might require additional lightweight sensors to maintain reliability.

Load-bearing premise

Low-cost cues will reliably signal when richer perception is needed without missing important user needs or adding unacceptable delay.

What would settle it

Real-world trials in which the agent either fails to activate detailed sensing for a genuine user need or incurs noticeable latency before responding.

Figures

Figures reproduced from arXiv: 2512.06721 by Bufang Yang, Guoliang Xing, Kaiwei Liu, Liekang Zeng, Lilin Xu, Siyang Jiang, Wenrui Lu, Xiaofan Jiang, Yixuan Li, Yunqi Guo, Zhenyu Yan.

**Figure 3.** Figure 3: Adapting existing LLMs and VLMs to the proactive agents. Recall Sampling Ratio 0 20 40 60 80 Percentage (%) SpeakerSense AdaSense Reducto P-20s [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Impact of egocentric video on proactive agent [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: System overview of ProAgent. assistance. However, the unique nature of proactive agents operating without explicit user instructions poses the following challenges: extracting proactive cues from massive sensor data to accurately anticipate user needs, and the system overhead of continuous perception and reasoning. 4 SYSTEM DESIGN 4.1 System Overview ProAgent is an end-to-end proactive agent system that … view at source ↗

**Figure 7.** Figure 7: Performance of proactive agent reasoning and [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Pipeline of context-aware persona retrieval. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 10.** Figure 10: Tiered on-demand perception in ProAgent. [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Our real-world testbed. We implement ProA [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: End-to-end evaluation of ProAgent. ContextAgent-SFT. This baseline [56] extends ContextLLM with both tool-calling capabilities and SFT-based fine-tuning of the LLM agent to enhance its reasoning abilities. It adopts periodic sampling strategies at 10s for perceiving vision data. The three baselines above use a two-stage pipeline, where a VLM first generates visual contexts, followed by an LLM agent for pr… view at source ↗

**Figure 15.** Figure 15: Performance of adaptive persona retrieval. [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗

**Figure 20.** Figure 20: Performance of on-demand tiered perception in ProAgent. 0 5 10 15 20 25 30 Time (min) 0.0 0.2 0.4 0.6 0.8 1.0 Similarity Score Vision Perception GT Similarity Score [PITH_FULL_IMAGE:figures/full_fig_p011_20.png] view at source ↗

**Figure 19.** Figure 19: Ablation study. w/o L, w/o M, w/o A, and w/o [PITH_FULL_IMAGE:figures/full_fig_p011_19.png] view at source ↗

**Figure 23.** Figure 23: Comparison of different base VLMs used in [PITH_FULL_IMAGE:figures/full_fig_p012_23.png] view at source ↗

read the original abstract

Recent studies have begun to explore proactive large language model (LLM) agents that provide unobtrusive assistance by automatically leveraging contextual information, such as in code editing and in-app suggestions. However, most focus on short, task-specific episodes or on-screen contexts, rather than continuously perceiving and assisting users throughout daily life. Enabling such in-the-wild assistance requires continuous sensing of users' surroundings, which can incur substantial system overhead. In this work, we propose ProAgent, an end-to-end proactive agent system that harnesses on-demand sensory contexts to provide in-the-wild assistance. ProAgent first employs on-demand tiered perception to continuously sense users' surroundings by integrating low-cost contextual cues with richer perception on demand, and uses proactive-oriented context extraction to derive hierarchical contexts integrating both sensory contexts and human preferences. ProAgent then employs a context-aware proactive reasoner to infer user needs and invokes external tools to deliver proactive assistance. We implement ProAgent on AR glasses and evaluate it on a public dataset and a real-world dataset. Results demonstrate that ProAgent achieves up to 27.7% higher proactive prediction accuracy and 20.5% lower false detection than state-of-the-art baselines. A user study with 20 participants shows that 85% were satisfied with ProAgent and willing to use it in daily life.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProAgent improves efficiency for continuous proactive agents with tiered sensing but leaves cue reliability under-analyzed.

read the letter

ProAgent shows how to cut sensing overhead in proactive LLM agents by running low-cost cues continuously and pulling in richer perception only when those cues suggest it's useful. This on-demand tiered perception, combined with hierarchical context extraction that includes user preferences, targets continuous in-the-wild assistance rather than short episodes. The work does a good job grounding the idea in experiments. On public and real-world datasets, it reports up to 27.7% higher proactive prediction accuracy and 20.5% lower false detection rates than state-of-the-art baselines. The 20-participant study on AR glasses finds 85% of users satisfied and open to daily use. That gives some evidence the system can deliver unobtrusive help. The soft spot is the trigger reliability. The paper lacks an ablation or error analysis on how often the low-cost cues miss important or subtle user needs, such as in atypical environments or with shifting preferences. If those cues underperform, the efficiency gains could come at the cost of missed assistance opportunities, and we don't have numbers to judge the risk. Latency from on-demand activation also gets little attention. This paper is for researchers focused on practical deployment of proactive agents in wearable or mobile settings. It has enough data and addresses a real problem to deserve peer review. I would send it to referees but suggest they probe the cue failure modes and real-world robustness more closely.

Referee Report

2 major / 2 minor

Summary. The paper proposes ProAgent, an end-to-end proactive LLM agent system for in-the-wild assistance that integrates on-demand tiered perception (low-cost contextual cues triggering richer sensing), proactive-oriented hierarchical context extraction incorporating sensory data and user preferences, and a context-aware reasoner that infers needs and invokes external tools. The system is implemented on AR glasses and evaluated on a public dataset plus a real-world dataset, reporting up to 27.7% higher proactive prediction accuracy and 20.5% lower false detection versus state-of-the-art baselines, plus 85% user satisfaction in a 20-participant study.

Significance. If the empirical gains hold under broader conditions, the work would meaningfully advance proactive LLM agents by showing how tiered, on-demand sensing can reduce continuous overhead while maintaining relevance for daily-life assistance. The combination of low-cost cues with hierarchical context and tool invocation provides a concrete architecture that could be extended to other wearable or ambient platforms.

major comments (2)

[Evaluation section (results on public and real-world datasets)] The central claim of 27.7% higher proactive prediction accuracy and 20.5% lower false detection depends on the tiered perception pipeline correctly activating richer sensing only when low-cost cues indicate relevant needs. No ablation study or error analysis is provided that quantifies miss rates of the low-cost cues under atypical environments, preference shifts, or novel situations (see Evaluation section and results tables). Without this, it is unclear whether the reported gains generalize to continuous in-the-wild operation.
[User study subsection] The user study with 20 participants reports 85% satisfaction and willingness to use the system, yet provides insufficient detail on experimental protocol, how latency or missed detections were assessed, and comparison to baselines. This weakens support for the claim of practical daily-life applicability.

minor comments (2)

[System architecture / tiered perception description] Clarify the exact low-cost cues, perception thresholds, and activation logic in the tiered perception module; the current description leaves the decision criteria somewhat underspecified.
[Abstract and Evaluation] In the abstract and results, state the precise datasets and conditions under which the maximum 27.7% accuracy gain is observed rather than reporting only the peak value.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the two major points below and will revise the manuscript to strengthen the evaluation and user study sections.

read point-by-point responses

Referee: [Evaluation section (results on public and real-world datasets)] The central claim of 27.7% higher proactive prediction accuracy and 20.5% lower false detection depends on the tiered perception pipeline correctly activating richer sensing only when low-cost cues indicate relevant needs. No ablation study or error analysis is provided that quantifies miss rates of the low-cost cues under atypical environments, preference shifts, or novel situations (see Evaluation section and results tables). Without this, it is unclear whether the reported gains generalize to continuous in-the-wild operation.

Authors: We agree that quantifying the miss rates of the low-cost cues is important for demonstrating generalization. In the revised manuscript we will add an ablation study and error analysis subsection to the Evaluation section. This will report miss rates and failure cases of the tiered perception module on both the public and real-world datasets under atypical environments, preference shifts, and novel situations, together with an analysis of how these errors affect overall proactive prediction accuracy and false detections. revision: yes
Referee: [User study subsection] The user study with 20 participants reports 85% satisfaction and willingness to use the system, yet provides insufficient detail on experimental protocol, how latency or missed detections were assessed, and comparison to baselines. This weakens support for the claim of practical daily-life applicability.

Authors: We acknowledge that additional protocol details are needed. In the revised User Study subsection we will expand the description to include the full experimental protocol (participant recruitment, demographics, task scenarios, and session structure), the specific methods used to measure and log latency and missed detections, and direct comparisons of user satisfaction and perceived usefulness against the same baselines used in the quantitative evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation on external datasets

full rationale

The paper describes an implemented proactive agent system (ProAgent) using tiered perception and context extraction, then reports measured accuracy gains (up to 27.7% higher proactive prediction accuracy) from evaluation on a public dataset and a real-world dataset. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All central claims rest on external experimental results rather than reducing to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach introduces system-level design choices but relies on standard assumptions about sensor reliability and LLM reasoning capabilities.

free parameters (1)

tiered perception thresholds
Parameters determining when to switch from low-cost to richer sensing based on contextual cues.

axioms (1)

domain assumption Contextual cues from low-cost sensors can reliably indicate need for detailed perception
Central to the on-demand tiered perception module.

pith-pipeline@v0.9.0 · 5579 in / 1225 out tokens · 62216 ms · 2026-05-17T00:34:44.725989+00:00 · methodology

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
cs.AI 2026-05 unverdicted novelty 7.0

Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action u...
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench
cs.AI 2026-04 unverdicted novelty 7.0

ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.
SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams
cs.CL 2026-03 unverdicted novelty 7.0

SensorPersona uses LLMs for hierarchical reasoning on longitudinal mobile sensor streams to continually extract stable personas, showing up to 31.4% higher recall and 85.7% win rate over baselines on a 20-user dataset.
Agentic Coding Needs Proactivity, Not Just Autonomy
cs.SE 2026-05 conditional novelty 6.0

Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
VisionClaw: Always-On AI Agents through Smart Glasses
cs.HC 2026-04 unverdicted novelty 5.0

VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunisti...
Position: Life-Logging Video Streams Make the Privacy-Utility Trade-off Inevitable
cs.CV 2026-05 unverdicted novelty 4.0

Life-logging video streams create an inevitable privacy-utility trade-off that is a foundational challenge for always-on AI systems.
PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
cs.AI 2026-04 unverdicted novelty 4.0

PASK introduces the DD-MM-PAS paradigm for streaming proactive agents with intent-aware detection, hybrid memory modeling, and a new real-world benchmark where the IntentFlow model matches top LLMs on latency while fi...

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 7 Pith papers · 9 internal anchors

[1]

4 ways Pixel’s Magic Cue can help you save time

2025. 4 ways Pixel’s Magic Cue can help you save time. https://blog. google/products/pixel/google-pixel-magic-cue-ai-feature/

work page 2025
[2]

Insta360 GO

2025. Insta360 GO. https://www.insta360.com/hk/product/insta360- go

work page 2025
[3]

2025. Ollama. https://ollama.com/

work page 2025
[4]

RayNeo X3 Pro

2025. RayNeo X3 Pro. https://rayneo.cn/x3pro.html

work page 2025
[5]

Use Fall Detection with Apple Watch

2025. Use Fall Detection with Apple Watch. https://support.apple.com/en-hk/108896

work page 2025
[6]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al . 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. 2025. Qwen2.5-VL Technical Rep...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Vernon Austel, Sadhana Kumar- avel, Vinod Muthusamy, Pavan Kapanipathi, et al. 2024. API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs. InProceedings of the 62nd Annual Meeting of the Association for Com- putational Linguistics (Volume 1: L...

work page 2024
[9]

Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, and Pavlo Molchanov. 2025. Small Language Models are the Future of Agentic AI.arXiv preprint arXiv:2506.02153(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al . 2024. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24185–24198

work page 2024
[11]

Ye Cheng, Minghui Xu, Yue Zhang, Kun Li, Ruoxi Wang, and Lian Yang. 2024. AutoIoT: Automated IoT Platform Using Large Language Models.IEEE Internet of Things Journal(2024)

work page 2024
[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova

work page
[13]

InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)

Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186

work page 2019
[14]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al . 2022. A survey on in-context learning.arXiv preprint arXiv:2301.00234(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, and Vikram Iyer. 2024. Exploring and characterizing large language models for embedded system development and debugging. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–9

work page 2024
[16]

Yi Gao, Kaijie Xiao, Fu Li, Weifeng Xu, Jiaming Huang, and Wei Dong

work page
[17]

ChatIoT: Zero-code Generation of Trigger-action Based IoT Programs.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 3 (2024), 1–29

work page 2024
[18]

Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, and Xiangyu Yue. 2024. Onellm: One framework to align all modalities with language. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26584–26595

work page 2024
[20]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future re- search directions.arXiv preprint arXiv:2503.23278(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3

work page 2022
[22]

Sijie Ji, Xinzhe Zheng, and Chenshu Wu. 2024. Hargpt: Are llms zero- shot human activity recognizers?. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 38–43

work page 2024
[23]

Glenn Jocher, Ayush Chaurasia, Alex Stoken, et al. 2022. ultralytics/y- olov5: v7. 0-YOLOv5 SOTA realtime instance segmentation, November 2022.Retrieved February3 (2022), 2023

work page 2022
[24]

Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, and Hae Won Park. 2024. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. InConference on Health, Inference, and Learning. PMLR, 522–539

work page 2024
[25]

Evan King, Haoxiang Yu, Sangsu Lee, and Christine Julien. 2024. Sasha: creative goal-oriented reasoning in smart homes with large language models.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 1 (2024), 1–38

work page 2024
[26]

Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steve Ko, Sangeun Oh, and Insik Shin. 2024. Mobilegpt: Aug- menting llm with human-like app memory for mobile task automation. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking. 1119–1133

work page 2024
[27]

Ying Lei, Yancheng Cao, Will Wang, Yuanzhe Dong, Changchang Yin, Weidan Cao, Ping Zhang, Jingzhen Yang, Bingsheng Yao, Yifan Peng, et al. 2025. WatchGuardian: Enabling User-Defined Personalized Just- in-Time Intervention on Smartwatch.arXiv preprint arXiv:2502.05783 (2025)

work page arXiv 2025
[28]

Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from transformers via speculative decoding. InInternational Conference on Machine Learning. PMLR, 19274–19286

work page 2023
[29]

Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. 2023. API-Bank: A Compre- hensive Benchmark for Tool-Augmented LLMs. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 3102–3116

work page 2023
[30]

Yuanqi Li, Arthi Padmanabhan, Pengzhan Zhao, Yufei Wang, Guo- qing Harry Xu, and Ravi Netravali. 2020. Reducto: On-camera filtering for resource-efficient real-time video analytics. InProceedings of the Annual conference of the ACM Special Interest Group on Data Commu- nication on the applications, technologies, architectures, and protocols for computer c...

work page 2020
[31]

Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guo- hong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, et al. 2024. Personal llm agents: Insights and survey about the capability, efficiency and security.arXiv preprint arXiv:2401.05459(2024)

work page internal anchor Pith review arXiv 2024
[32]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81

work page 2004
[33]

Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Mohammad Shoeybi, and Song Han. 2024. Vila: On pre-training for visual language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 26689–26699

work page 2024
[34]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning

work page 2023
[35]

Kaiwei Liu, Bufang Yang, Lilin Xu, Yunqi Guo, Neiwen Ling, Zhihe Zhao, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang, et al. 2024. Tasking Heterogeneous Sensor Systems with LLMs. InProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems. 901–902. 13

work page 2024
[36]

Kaiwei Liu, Bufang Yang, Lilin Xu, Yunqi Guo, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang, and Zhenyu Yan. 2025. TaskSense: A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs. InProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems. 213–225

work page 2025
[37]

Hong Lu, AJ Bernheim Brush, Bodhi Priyantha, Amy K Karlson, and Jie Liu. 2011. Speakersense: Energy efficient unobtrusive speaker identification on mobile phones. InPervasive Computing: 9th Interna- tional Conference, Pervasive 2011, San Francisco, USA, June 12-15, 2011. Proceedings 9. Springer, 188–205

work page 2011
[38]

Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, et al. 2024. Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance.arXiv preprint arXiv:2410.12361(2024)

work page arXiv 2024
[39]

Andres Marafioti, Merve Noyan, Miquel Farré, Elie Bakouch, and Pedro Cuenca. 2024. Smolvlm-small yet mighty vision language model

work page 2024
[40]

Marina Neseem, Jon Nelson, and Sherief Reda. 2020. AdaSense: adap- tive low-power sensing and activity recognition for wearable devices. In2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6

work page 2020
[41]

Xiaomin Ouyang and Mani Srivastava. 2024. LLMSense: Harnessing LLMs for high-level reasoning over spatiotemporal sensor traces. In 2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML). IEEE, 9–14

work page 2024
[42]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InPro- ceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318

work page 2002
[43]

Kevin Post, Reo Kuchida, Mayowa Olapade, Zhigang Yin, Petteri Nurmi, and Huber Flores. 2025. ContextLLM: Meaningful Context Reasoning from Multi-Sensor and Multi-Device Data Using LLMs. InProceed- ings of ACM HOTMOBILE’25. Association for Computing Machinery (ACM)

work page 2025
[44]

Jianing Qiu, Kyle Lam, Guohao Li, Amish Acharya, Tien Yin Wong, Ara Darzi, Wu Yuan, and Eric J Topol. 2024. LLM-based agentic systems in medicine and healthcare.Nature Machine Intelligence6, 12 (2024), 1418–1420

work page 2024
[45]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence em- beddings using siamese bert-networks.arXiv preprint arXiv:1908.10084 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[46]

Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. 2019. Objects365: A large-scale, high- quality dataset for object detection. InProceedings of the IEEE/CVF international conference on computer vision. 8430–8439

work page 2019
[47]

Leming Shen, Qiang Yang, Yuanqing Zheng, and Mo Li. 2025. AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications.arXiv preprint arXiv:2503.05346(2025)

work page arXiv 2025
[48]

Silero Team. 2024. Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. https://github.com/snakers4/silero-vad

work page 2024
[49]

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. 2024. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi- agent collaboration.arXiv preprint arXiv:2406.01014(2024)

work page arXiv 2024
[50]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompt- ing elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[51]

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2024. Autodroid: Llm-powered task automation in android. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking. 543–557

work page 2024
[52]

Huatao Xu, Liying Han, Qirui Yang, Mo Li, and Mani Srivastava. 2024. Penetrative ai: Making llms comprehend the physical world. InProceed- ings of the 25th International Workshop on Mobile Computing Systems and Applications. 1–7

work page 2024
[53]

Huatao Xu, Panron Tong, Mo Li, and Mani Srivastava. 2024. AutoLife: Automatic Life Journaling with Smartphones and LLMs.arXiv preprint arXiv:2412.15714(2024)

work page arXiv 2024
[54]

Bufang Yang, Yunqi Guo, Lilin Xu, Zhenyu Yan, Hongkai Chen, Guo- liang Xing, and Xiaofan Jiang. 2025. SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions.Proceedings of the ACM on Interactive, Mobile, Wear- able and Ubiquitous Technologies9, 1 (2025), 1–30

work page 2025
[55]

Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren, and Xin Jiang. 2023. Edgefm: Leveraging foundation model for open-set learning on the edge. InProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems. 111–124

work page 2023
[56]

Bufang Yang, Lixing He, Kaiwei Liu, and Zhenyu Yan. 2024. Viassist: Adapting multi-modal large language models for users with visual impairments. In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys). IEEE, 32–37

work page 2024
[57]

Bufang Yang, Siyang Jiang, Lilin Xu, Kaiwei Liu, Hai Li, Guoliang Xing, Hongkai Chen, Xiaofan Jiang, and Zhenyu Yan. 2024. Drhouse: An llm-empowered diagnostic reasoning system through harnessing outcomes from sensor data and expert knowledge.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 4 (2024), 1–29

work page 2024
[58]

Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, and Zhenyu Yan

work page
[59]

InThe 39th Annual Conference on Neural Information Processing Systems

ContextAgent: Context-Aware Proactive LLM Agents with Open- world Sensory Perceptions. InThe 39th Annual Conference on Neural Information Processing Systems. NeurIPS

work page
[60]

Huanqi Yang, Mingzhe Li, Mingda Han, Zhenjiang Li, and Weitao Xu. 2024. EmbedGenius: Towards Automated Software Development for Generic Embedded IoT Systems.arXiv preprint arXiv:2412.09058 (2024)

work page arXiv 2024
[61]

En Yu, Kangheng Lin, Liang Zhao, Jisheng Yin, Yana Wei, Yuang Peng, Haoran Wei, Jianjian Sun, Chunrui Han, Zheng Ge, et al. 2025. Perception-r1: Pioneering perception policy with reinforcement learn- ing.arXiv preprint arXiv:2504.07954(2025)

work page arXiv 2025
[62]

Xiaofan Yu, Lanxiang Hu, Benjamin Reichman, Dylan Chu, Rushil Chandrupatla, Xiyuan Zhang, Larry Heck, and Tajana Rosing. 2025. SensorChat: Answering Qualitative and Quantitative Questions dur- ing Long-Term Multimodal Sensor Interactions.arXiv preprint arXiv:2502.02883(2025)

work page arXiv 2025
[63]

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al

work page
[64]

InProceedings of the AAAI Conference on Artificial Intelligence, Vol

Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17591–17599

work page
[65]

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. 2023. Appagent: Multimodal agents as smartphone users.arXiv preprint arXiv:2312.13771(2023)

work page arXiv 2023
[66]

Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, and Tat-Seng Chua. 2024. Ask-before-plan: Proactive language agents for real-world planning.arXiv preprint arXiv:2406.12639(2024)

work page arXiv 2024
[67]

Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, J Ch, and Cecilia Mascolo. 2024. Towards open respiratory acoustic foundation models: Pretraining and benchmark- ing.Advances in Neural Information Processing Systems37 (2024), 27024–27055. 14

work page 2024
[68]

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al . 2025. Internvl3: Exploring advanced training and test-time recipes for open- source multimodal models.arXiv preprint arXiv:2504.10479(2025). 15

work page internal anchor Pith review Pith/arXiv arXiv 2025