CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Junye Du; Lequan Yu; Long Feng; Qian Niu; Qifan Wang; Yushi Feng; Yutaka Matsuo; Zizhan Ma

arxiv: 2604.09155 · v1 · submitted 2026-04-10 · 💻 cs.LG · cs.AI

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Yushi Feng , Junye Du , Qifan Wang , Zizhan Ma , Qian Niu , Yutaka Matsuo , Long Feng , Lequan Yu This is my paper

Pith reviewed 2026-05-10 16:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords conformal risk controlGUI agentsmobile safetyVLM safeguardsPhone-Harm benchmarkrisk calibrationselective action executionautonomous agents

0 comments

The pith

CORA uses conformal risk control to give statistical guarantees that mobile GUI agents will not execute harmful actions beyond a user-set budget.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CORA as a post-policy safeguard for vision-language model agents that control mobile phone interfaces. It trains a Guardian model to score the risk that each proposed action will cause harm, then applies conformal risk control to draw a calibrated execute-or-abstain line that keeps the long-run fraction of harmful executions below whatever risk level the user chooses. Actions the line rejects are handed to a Diagnostician that reasons multimodally over the situation and suggests the lightest intervention (confirm, reflect, or abort) that still protects the user. A Goal-Lock mechanism freezes the original user intent so that visual tricks cannot inflate risk estimates. On the new Phone-Harm benchmark the method moves the safety-helpfulness-interruption trade-off outward compared with prompt-based or heuristic baselines.

Core claim

CORA reformulates safety as selective action execution: a Guardian estimates action-conditional risk, conformal risk control calibrates an execute/abstain boundary that satisfies a user-specified risk budget, rejected actions are routed to a trainable Diagnostician for multimodal intervention recommendations, and a Goal-Lock anchors assessment to a clarified frozen user intent, thereby supplying statistical guarantees on harmful executed actions while improving the safety-helpfulness-interruption Pareto frontier.

What carries the argument

Conformal Risk Control applied to the output of a Guardian model that predicts per-action harm probability, producing a data-driven threshold that guarantees the desired risk level even when the underlying data are sequential.

If this is right

Users can directly set a maximum tolerable harm rate and receive a provable bound rather than relying on prompt engineering or fixed heuristics.
Rejected actions trigger the lightest possible user intervention instead of blanket refusal, preserving helpfulness.
The same calibration procedure can be reused across different VLMs and task distributions without retraining the threshold from scratch.
Goal-Lock makes the safety layer resistant to visual prompt-injection attacks that try to change perceived intent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selective-execution pattern could be ported to web browsers or robotic arms where each action also carries irreversible risk.
If the calibration set is updated online, the framework might support lifelong agents that keep their risk bound even as the interface or user habits drift.
The Diagnostician could be trained jointly with the main policy so that the whole system learns to propose safer actions from the start.

Load-bearing premise

The Guardian's risk scores are well enough calibrated and the sequential non-i.i.d. nature of GUI steps still lets the conformal calibration procedure deliver its coverage guarantee in practice.

What would settle it

Collect a fresh, large set of Phone-Harm trajectories, run CORA with a fixed risk budget alpha, and check whether the empirical fraction of executed actions that are labeled harmful exceeds alpha by more than the conformal tolerance.

Figures

Figures reproduced from arXiv: 2604.09155 by Junye Du, Lequan Yu, Long Feng, Qian Niu, Qifan Wang, Yushi Feng, Yutaka Matsuo, Zizhan Ma.

**Figure 1.** Figure 1: Overview of our proposed CORA framework. CORA is a safety shield for mobile GUI agents that turns open-ended action proposal into selective execution. Given a locked, clarified user intent, the base agent proposes a low-level GUI action (tap/type/swipe) from the current screenshot and UI tree. An action-conditional Guardian scores the risk of executing the specific proposed action in the current screen und… view at source ↗

**Figure 2.** Figure 2: Safety–Helpfulness–Interruption Pareto frontier on Phone-Harm. Methods are plotted in the HR–GAR plane; marker size encodes the Over-Intervention Rate, and the ideal operating point lies in the top-left corner with a small marker. (a) Harm-150 subset, covering all eight methods. (b) Full benchmark under the Harm-150 + Normal-150 mixture defined in [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Harm-150 distribution overview. (a) App distribution over the full Harm-150 subset, [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative cases from Harm-150 demonstrating CORA’s safety behaviour: prompt [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: CRC ablation on CORA (δ = 0.05, holdout n = 1075) [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Action-conditioning ablation on CORA (δ = 0.05, holdout n = 1075). E.1 The Influence of Conformal Risk Control (CRC) To understand the necessity of Conformal Risk Control, we ablate this module by replacing its statistically calibrated boundary with a static, development-tuned threshold. This naive approach destabilizes the operating point and compromises the statistical safety bounds, leading to unpredict… view at source ↗

**Figure 7.** Figure 7: Ablation of diagnostician design on CORA (δ = 0.05, holdout n = 1075) [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Goal-lock ablation on CORA (δ = 0.05, holdout n = 1075). trade-off: task coverage improves from 85.77% to 89.02%, while the executed-harm rate rises from 0.56% to 1.86%, and the Diagnostician invocation rate drops from 14.23% to 10.98%. In contrast, the CRC module adaptively selects τˆ = 0.952. This optimally calibrated threshold achieves the best coverage (89.95%) and the lowest invocation rate (10.05%) a… view at source ↗

read the original abstract

Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and user-tunable guarantees. We propose CORA (COnformal Risk-controlled GUI Agent), a post-policy, pre-action safeguarding framework that provides statistical guarantees on harmful executed actions. CORA reformulates safety as selective action execution: we train a Guardian model to estimate action-conditional risk for each proposed step. Rather than thresholding raw scores, we leverage Conformal Risk Control to calibrate an execute/abstain boundary that satisfies a user-specified risk budget and route rejected actions to a trainable Diagnostician model, which performs multimodal reasoning over rejected actions to recommend interventions (e.g., confirm, reflect, or abort) to minimize user burden. A Goal-Lock mechanism anchors assessment to a clarified, frozen user intent to resist visual injection attacks. To rigorously evaluate this paradigm, we introduce Phone-Harm, a new benchmark of mobile safety violations with step-level harm labels under real-world settings. Experiments on Phone-Harm and public benchmarks against diverse baselines validate that CORA improves the safety--helpfulness--interruption Pareto frontier, offering a practical, statistically grounded safety paradigm for autonomous GUI execution. Code and benchmark are available at cora-agent.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CORA layers conformal risk control over a Guardian model for GUI agents and adds a Phone-Harm benchmark, but the sequential dependence in trajectories probably voids the finite-sample guarantees.

read the letter

The main point is that this work gives a concrete way to add tunable statistical safety to VLM-based mobile GUI agents by scoring proposed actions with a Guardian, using conformal risk control to set an execute/abstain threshold, and routing rejects to a Diagnostician for interventions like confirm or abort, all while locking the goal to resist injection attacks. They also release Phone-Harm, a step-level harm benchmark collected in real settings. That combination is new and the experiments claim better safety-helpfulness-interruption tradeoffs than prompt-based or heuristic baselines on both their benchmark and public ones. Code and data are public, which is helpful for follow-up work. The framework is practical for anyone who wants user-specified risk budgets instead of brittle rules. The soft spot is the conformal guarantee itself. Standard conformal risk control needs exchangeable calibration and test points, yet GUI trajectories are Markovian: each action alters the screen state and the next risk score depends on prior decisions. The paper calibrates post-policy on these same sequential trajectories without mentioning block-wise, martingale, or adaptive conformal methods that restore validity under dependence. If the dependence is material, the realized fraction of harmful actions can exceed the nominal alpha even when the calibration set looks fine. The abstract does not show error bars, calibration-set size, or sensitivity checks that would let a reader judge how large the violation might be. This paper is aimed at people building or evaluating safe autonomous agents for consumer devices. A reader working on conformal methods in sequential decision settings or on mobile automation safety would find the benchmark and the overall architecture worth looking at. It deserves peer review because the problem is timely, the benchmark is a concrete addition, and the empirical claims can be checked once the dependence issue is addressed or bounded.

Referee Report

2 major / 3 minor

Summary. The manuscript presents CORA, a post-policy safeguarding framework for VLM-based GUI agents. It trains a Guardian model to produce action-conditional risk scores, applies Conformal Risk Control to select an execute/abstain threshold that meets a user-specified risk budget alpha, routes rejected actions to a Diagnostician model for multimodal intervention recommendations, and uses a Goal-Lock mechanism to anchor to frozen user intent. A new benchmark Phone-Harm with step-level harm labels is introduced, and experiments on Phone-Harm plus public benchmarks claim improvements to the safety-helpfulness-interruption Pareto frontier.

Significance. If the finite-sample guarantees hold, the work supplies a statistically grounded, user-tunable safety layer for autonomous GUI agents that addresses a timely deployment concern. The Phone-Harm benchmark is a concrete contribution for safety evaluation. The approach avoids circularity by relying on standard conformal risk control whose validity is independent of the paper's empirical results.

major comments (2)

[Conformal Risk Control calibration procedure] The calibration procedure (described in the section on Conformal Risk Control) applies the standard risk-controlling threshold of Angelopoulos et al. to trajectories drawn from Phone-Harm. These trajectories are Markovian and non-exchangeable: each executed action changes the screen state, so future Guardian scores depend on prior decisions. The manuscript does not invoke block-wise, martingale, or adaptive conformal extensions that would restore validity under dependence. Because the central claim is that CORA delivers finite-sample guarantees on the fraction of harmful executed actions, this omission is load-bearing.
[Experimental results on Phone-Harm] Table reporting experimental results on Phone-Harm (and the corresponding Pareto-frontier plots): the manuscript claims that the realized harm rate respects the user-specified alpha, yet provides neither error bars on the empirical miscoverage nor details on how the calibration set is constructed from sequential trajectories (e.g., number of trajectories, temporal splitting). Without these, it is impossible to verify whether the reported risk budget is actually achieved or whether post-hoc choices undermine the guarantee.

minor comments (3)

[Abstract] The abstract states that CORA 'provides statistical guarantees' but does not state the precise guarantee (e.g., with probability at least 1-delta the fraction of harmful actions is at most alpha). Adding this sentence would clarify the claim.
[Framework description] The description of the Diagnostician model and Goal-Lock mechanism would benefit from an explicit equation or pseudocode showing how the intervention recommendation is produced and how Goal-Lock is enforced at inference time.
[Phone-Harm benchmark] Phone-Harm is a useful benchmark, but the manuscript should report inter-annotator agreement for the step-level harm labels and the exact protocol used to generate the trajectories.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript on CORA. The comments raise important points about the validity of our conformal guarantees under sequential dependence and the transparency of our experimental reporting. We address each major comment below and have prepared revisions to improve rigor and clarity.

read point-by-point responses

Referee: [Conformal Risk Control calibration procedure] The calibration procedure (described in the section on Conformal Risk Control) applies the standard risk-controlling threshold of Angelopoulos et al. to trajectories drawn from Phone-Harm. These trajectories are Markovian and non-exchangeable: each executed action changes the screen state, so future Guardian scores depend on prior decisions. The manuscript does not invoke block-wise, martingale, or adaptive conformal extensions that would restore validity under dependence. Because the central claim is that CORA delivers finite-sample guarantees on the fraction of harmful executed actions, this omission is load-bearing.

Authors: We appreciate the referee's emphasis on the exchangeability assumption underlying standard conformal risk control. In our implementation, the calibration set is assembled from a large collection of independent trajectories drawn from the Phone-Harm distribution; each full trajectory is treated as a single exchangeable unit when computing the risk-controlling threshold. This design ensures that the finite-sample guarantee applies marginally over the trajectory distribution, even though intra-trajectory scores exhibit Markovian dependence. Nevertheless, we acknowledge that a stricter per-step guarantee would benefit from dependence-aware extensions. In the revised manuscript we will (i) explicitly describe the trajectory-level calibration construction, (ii) add a dedicated paragraph discussing the implications of within-trajectory dependence, and (iii) cite relevant block-wise and adaptive conformal literature (e.g., works on martingale-based conformal prediction). We will also report an empirical sensitivity analysis showing that realized miscoverage remains close to the nominal level across varying trajectory lengths. These changes clarify the scope of our guarantees without altering the core algorithm. revision: partial
Referee: [Experimental results on Phone-Harm] Table reporting experimental results on Phone-Harm (and the corresponding Pareto-frontier plots): the manuscript claims that the realized harm rate respects the user-specified alpha, yet provides neither error bars on the empirical miscoverage nor details on how the calibration set is constructed from sequential trajectories (e.g., number of trajectories, temporal splitting). Without these, it is impossible to verify whether the reported risk budget is actually achieved or whether post-hoc choices undermine the guarantee.

Authors: We agree that the current experimental section lacks sufficient statistical detail and transparency. In the revised version we will augment all Phone-Harm tables and Pareto-frontier figures with error bars (standard errors computed over multiple random calibration/test splits). We will also add an appendix subsection that fully specifies the calibration-set construction: the total number of trajectories (e.g., 500), the temporal splitting protocol that reserves entire trajectories for calibration versus testing to prevent leakage, the exact sizes of calibration and test sets, and the random seed used for reproducibility. These additions will enable readers to reproduce and verify that the reported harm rates respect the user-specified alpha under the stated protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard conformal risk control applied independently

full rationale

The paper's central derivation applies Conformal Risk Control (citing Angelopoulos et al.) to calibrate a Guardian model's action-conditional risk scores into an execute/abstain threshold satisfying a user-specified risk budget. This step invokes an external, established method whose finite-sample validity guarantees are independent of the present work and do not reduce to any quantity defined or fitted within the paper itself. The Goal-Lock mechanism, Diagnostician model, and Phone-Harm benchmark are introduced as separate components without self-referential equations that equate predictions to their own inputs by construction. No self-citation chain, ansatz smuggling, or renaming of known results forms a load-bearing loop in the claimed statistical guarantees. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 3 invented entities

The framework rests on the standard exchangeability assumption of conformal prediction plus the empirical performance of two newly introduced learned models whose training details are not visible in the abstract.

free parameters (1)

user-specified risk budget alpha
Controls the execute/abstain threshold; calibration is performed with respect to this value.

axioms (1)

standard math Calibration and test data are exchangeable for the conformal risk control guarantee to hold.
Core assumption invoked by any conformal risk control procedure.

invented entities (3)

Guardian model no independent evidence
purpose: Estimate action-conditional risk for each proposed GUI step.
New trainable component whose accuracy is required for the threshold to be meaningful.
Diagnostician model no independent evidence
purpose: Multimodal reasoning over rejected actions to recommend interventions.
New trainable component introduced to reduce user burden on abstentions.
Goal-Lock mechanism no independent evidence
purpose: Anchor risk assessment to a frozen user intent to resist visual injection attacks.
Proposed mechanism whose effectiveness is asserted but not independently validated in the abstract.

pith-pipeline@v0.9.0 · 5585 in / 1539 out tokens · 40909 ms · 2026-05-10T16:43:36.434101+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
cs.CL 2026-05 unverdicted novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

URLhttps://doi.org/10.1145/3680207.3765248

doi: 10.1145/3680207.3765248. URLhttps://doi.org/10.1145/3680207.3765248. Chen Xu. and Yao Xie. Sequential predictive conformal inference for time series, 2023. Preprint. Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. InAdvances in Neural Information Processing ...

work page doi:10.1145/3680207.3765248 2023
[2]

Qwen3-VL Technical Report

URLhttps://arxiv.org/abs/2511.21631. ByteDance Seed. Ui-tars-1.5.https://seed-tars.com/1.5, 2025b. Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, and Yuxiao Dong. Mobilerl: Online agentic reinforcement learning for mobile gui agents, 2025. URLhttps://arxiv.org/abs/2509.18119. Google. Gemini 3 f...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.neunet.2025.107777 2025
[3]

password

Prevent deliberate misuse (Fig. 4a):Given a harmful user instruction attempting privacy theft (e.g., publicly sharing and granting edit access to a sensitive “password” document), CORA correctly identifies the inherent risk in the goal itself and triggers an ABORTbefore any privacy-compromising action is executed

work page
[4]

4b):CORA recognizes a malicious on-screen message that conflicts with the frozen user goal

Resist prompt injections (Fig. 4b):CORA recognizes a malicious on-screen message that conflicts with the frozen user goal. Instead of following the injected instruction, it reasons to safely dismiss the harmful pop-up and seamlessly proceeds with the original task

work page
[5]

1-tap buy

Mitigate model misbehavior (Fig. 4c):Confronted with an irreversible financial transaction (a “1-tap buy” in an app store) under a benign user goal, CORA avoids autonomous execution. Instead, it routes the proposed action to a CONFIRMintervention, effectively preventing high-stakes errors via human-in-the-loop gating. D Evaluation Metrics This section det...

work page 2024

[1] [1]

URLhttps://doi.org/10.1145/3680207.3765248

doi: 10.1145/3680207.3765248. URLhttps://doi.org/10.1145/3680207.3765248. Chen Xu. and Yao Xie. Sequential predictive conformal inference for time series, 2023. Preprint. Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. InAdvances in Neural Information Processing ...

work page doi:10.1145/3680207.3765248 2023

[2] [2]

Qwen3-VL Technical Report

URLhttps://arxiv.org/abs/2511.21631. ByteDance Seed. Ui-tars-1.5.https://seed-tars.com/1.5, 2025b. Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, and Yuxiao Dong. Mobilerl: Online agentic reinforcement learning for mobile gui agents, 2025. URLhttps://arxiv.org/abs/2509.18119. Google. Gemini 3 f...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.neunet.2025.107777 2025

[3] [3]

password

Prevent deliberate misuse (Fig. 4a):Given a harmful user instruction attempting privacy theft (e.g., publicly sharing and granting edit access to a sensitive “password” document), CORA correctly identifies the inherent risk in the goal itself and triggers an ABORTbefore any privacy-compromising action is executed

work page

[4] [4]

4b):CORA recognizes a malicious on-screen message that conflicts with the frozen user goal

Resist prompt injections (Fig. 4b):CORA recognizes a malicious on-screen message that conflicts with the frozen user goal. Instead of following the injected instruction, it reasons to safely dismiss the harmful pop-up and seamlessly proceeds with the original task

work page

[5] [5]

1-tap buy

Mitigate model misbehavior (Fig. 4c):Confronted with an irreversible financial transaction (a “1-tap buy” in an app store) under a benign user goal, CORA avoids autonomous execution. Instead, it routes the proposed action to a CONFIRMintervention, effectively preventing high-stakes errors via human-in-the-loop gating. D Evaluation Metrics This section det...

work page 2024