arxiv: 2604.25200 · v1 · submitted 2026-04-28 · 💻 cs.CR · cs.AI· cs.CY· cs.LG

Recognition: unknown

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

Kemal Bicakci

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:07 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CYcs.LG

keywords AI grant evaluationtrusted execution environmentremote attestationauditable AIalgorithmic accountabilityprompt injectionconfidential inference

0 comments

The pith

A trusted execution environment with remote attestation lets outsiders verify the exact model, rubric, and inputs used in AI grant evaluations without exposing weights or proprietary logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Public agencies considering large language models for grant evaluation face a core tension: the model and scoring rubric must stay hidden to stop applicants from optimizing against them, yet the process needs to support external checks for accountability. The paper outlines a TEE-based design that produces signed, timestamped evaluation bundles linking the submission hash, canonical input, model-and-rubric measurement, and output. Remote attestation allows a verifier to confirm which components were used while keeping model weights, scoring details, and intermediate reasoning private from both applicants and infrastructure operators. A sanitization layer is added to normalize documents and flag potential prompt injections. The authors stress that the approach makes parts of the process externally verifiable but does not establish whether the evaluation itself is fair or correct.

Core claim

The architecture produces an attested evaluation bundle, a signed and timestamped record that connects the original submission hash, the canonical input hash, the model-and-rubric measurement, and the evaluation output. Remote attestation enables an external verifier to confirm the model, rubric, prompt template, and input representation that were applied without disclosing model weights, proprietary scoring logic, or intermediate reasoning to applicants or operators.

What carries the argument

The attested evaluation bundle created through remote attestation in trusted execution environments, which cryptographically links and proves the evaluation components used.

If this is right

Verifiers can independently confirm the specific model and rubric applied to any given grant submission.
The evaluation output becomes tied to an attested configuration that can be checked after the fact.
Recorded sanitization steps allow detection of suspicious transformations that might indicate prompt injection attempts.
Infrastructure operators gain no access to the model weights or intermediate reasoning steps during the process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attested-bundle approach could extend to other public-sector AI decisions that require both secrecy and contestability.
The narrow claim leaves the scientific or fairness merits of the model's actual judgments outside the scope of what attestation can prove.
Combining the design with additional privacy tools might reduce remaining risks of indirect information leakage.

Load-bearing premise

Trusted execution environments and remote attestation can be configured to attest precisely to the evaluation components while blocking any leakage of model details or intermediate outputs under realistic adversarial conditions.

What would settle it

An attack that extracts model weights from the attested environment or produces a valid attestation for a different model or rubric than claimed would show the architecture cannot deliver the promised verification without exposure.

Figures

Figures reproduced from arXiv: 2604.25200 by Kemal Bicakci.

**Figure 1.** Figure 1: The four principals and their interactions. Applicant (top-left) submits proposals to the Agency (top-right) and view at source ↗

**Figure 2.** Figure 2: System architecture and data flow. The blue dashed box marks the TEE boundary: all operations inside are view at source ↗

read the original abstract

Public agencies are beginning to consider large language models (LLMs) as decision-support tools for grant evaluation. This creates a practical governance problem: the model and scoring rubric should not be exposed in a way that allows applicants to optimize against them, yet the evaluation process must remain auditable, contestable, and accountable. We propose a TEE-based architecture that helps reconcile these requirements through remote attestation. The architecture allows an external verifier to check which model, rubric, prompt template, and input representation were used, without exposing model weights, proprietary scoring logic, or intermediate reasoning to applicants or infrastructure operators. The main artifact is an attested evaluation bundle: a signed, timestamped record linking the original submission hash, the canonical input hash, the model-and-rubric measurement, and the evaluation output. The paper also considers a scenario-specific prompt injection risk: applicant-controlled documents may contain hidden or indirect instructions intended to influence the LLM evaluator. We therefore include a canonicalization and sanitization layer that normalizes document representations and records suspicious transformations before inference. We position the design relative to confidential AI inference, attestable AI audits, zero-knowledge machine learning, algorithmic accountability, and AI-assisted peer review. The resulting claim is deliberately narrow: remote attestation does not prove that an evaluation is fair or scientifically correct, but it can make part of the evaluation process externally verifiable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

High-level TEE design for verifiable LLM grant reviews is coherent but stops at the architecture sketch without security analysis.

read the letter

The paper sketches a TEE-based architecture that lets an external party attest to the exact model, rubric, prompt template, and input canonicalization used in an LLM grant evaluation while keeping weights and reasoning steps private. It adds a sanitization layer aimed at prompt injection from applicant documents and records everything in a signed attested bundle of hashes and measurements. The claim is scoped narrowly to verifiability of components rather than correctness of the outcome, which keeps the proposal grounded.

Referee Report

1 major / 2 minor

Summary. The paper proposes a TEE-based architecture using remote attestation to enable auditable AI-assisted grant evaluations without exposing model weights or intermediate reasoning. The central artifact is an attested evaluation bundle: a signed, timestamped record linking the original submission hash, canonical input hash, model-and-rubric measurement, and evaluation output. A canonicalization and sanitization layer is included to mitigate prompt injection risks from applicant-controlled documents. The authors deliberately narrow the claim to process verifiability rather than fairness or correctness of the evaluation.

Significance. If the architecture can be shown to meet its security properties, the work would provide a practical mechanism for public agencies to deploy LLMs in high-stakes decisions while supporting accountability and reducing opportunities for gaming. The narrow claim and explicit treatment of prompt injection risks are constructive. The contribution sits at the intersection of confidential computing and algorithmic accountability but remains prospective until supported by analysis.

major comments (1)

Abstract: The central guarantee—that remote attestation allows an external verifier to confirm the exact model, rubric, prompt template, and input representation without exposing weights or reasoning—depends on the security of the TEE boundary and attestation mechanism. The manuscript provides no threat model, no analysis of side-channel/rollback attacks on the enclave, and no argument that the canonicalization layer cannot be influenced once inside the enclave. This absence is load-bearing for assessing whether the attested bundle actually delivers the stated properties under realistic adversarial conditions.

minor comments (2)

The manuscript positions the design relative to confidential AI inference, attestable AI audits, zero-knowledge ML, and algorithmic accountability, but would benefit from additional specific citations to representative prior work in each area.
A high-level diagram or pseudocode illustrating the construction and verification of the attested evaluation bundle would improve clarity of the hash-linking and signing steps.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the paper's narrow scope and relevance to accountable AI deployment in public agencies. We address the single major comment below and will revise the manuscript to strengthen the security discussion.

read point-by-point responses

Referee: [—] Abstract: The central guarantee—that remote attestation allows an external verifier to confirm the exact model, rubric, prompt template, and input representation without exposing weights or reasoning—depends on the security of the TEE boundary and attestation mechanism. The manuscript provides no threat model, no analysis of side-channel/rollback attacks on the enclave, and no argument that the canonicalization layer cannot be influenced once inside the enclave. This absence is load-bearing for assessing whether the attested bundle actually delivers the stated properties under realistic adversarial conditions.

Authors: We agree that the absence of an explicit threat model limits the ability to evaluate the architecture's robustness. The manuscript presents a high-level design for attested evaluation bundles and positions it relative to existing confidential computing techniques, deliberately scoping claims to process verifiability rather than end-to-end security proofs. It relies on standard TEE attestation properties without enumerating adversaries or attack vectors. In the revised version we will add a dedicated 'Threat Model and Security Assumptions' section. This section will define the adversary (malicious applicants seeking to influence outputs or operators attempting to alter attested records), the trusted computing base (enclave isolation, remote attestation protocol, and hardware root of trust), and key assumptions drawn from TEE literature. We will briefly discuss side-channel and rollback risks with references to known mitigations and explain that the canonicalization/sanitization layer executes inside the attested enclave, so its behavior is covered by the model measurement. The abstract and introduction will be updated to state that verifiability holds conditional on a secure TEE implementation. This revision directly addresses the load-bearing concern while preserving the paper's architectural focus. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal independent of derivations or self-referential results

full rationale

The paper proposes a TEE-based architecture for making AI-assisted grant evaluation auditable without exposing model weights or intermediates. It defines an attested evaluation bundle linking hashes of submission, canonical input, model-and-rubric measurement, and output, plus a canonicalization layer for prompt injection risks. No equations, parameter fitting, predictions, or first-principles derivations appear anywhere in the manuscript. The central claim rests on standard remote attestation properties rather than reducing to any self-citation chain, ansatz, or input-by-construction equivalence. The design is positioned relative to existing literature on confidential AI and algorithmic accountability without importing uniqueness theorems or renaming known results as novel. This is a self-contained architectural proposal with no load-bearing steps that collapse to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on standard assumptions about TEE security properties and introduces one new conceptual artifact; no free parameters or data-fitted values are involved.

axioms (2)

domain assumption Trusted execution environments combined with remote attestation can reliably attest to the exact software, model, and input processing used without revealing weights or intermediate states.
Invoked as the foundation for the verifier's ability to check components.
domain assumption Canonicalization and sanitization can sufficiently neutralize prompt injection attempts embedded in applicant-controlled documents.
Added to address the scenario-specific risk mentioned in the abstract.

invented entities (1)

attested evaluation bundle no independent evidence
purpose: Signed, timestamped record that links submission hash, canonical input hash, model-and-rubric measurement, and evaluation output for external verification.
Presented as the central deliverable of the architecture.

pith-pipeline@v0.9.0 · 5543 in / 1419 out tokens · 65821 ms · 2026-05-07T16:07:19.882247+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 8 canonical work pages · 3 internal anchors

[1]

NIST Internal Report 8320, 2022

National Institute of Standards and Technology.Hardware-Enabled Security: Enabling a Layered Approach to Platform Security for Cloud and Edge Computing Use Cases. NIST Internal Report 8320, 2022. https: //nvlpubs.nist.gov/nistpubs/ir/2022/NIST.IR.8320.pdf

2022
[2]

NIST AI 100-1, 2023.https://www.nist.gov/itl/ai-risk-management-framework

National Institute of Standards and Technology.AI Risk Management Framework: AI RMF 1.0. NIST AI 100-1, 2023.https://www.nist.gov/itl/ai-risk-management-framework

2023
[3]

Azure Con- fidential Computing Blog, 2024

Microsoft Azure.Azure AI Confidential Inferencing: Technical Deep-Dive. Azure Con- fidential Computing Blog, 2024. https://techcommunity.microsoft.com/blog/ azureconfidentialcomputingblog/azure-ai-confidential-inferencing-technical-deep-dive/ 4253150

2024
[4]

Technical Report,

Anthropic.Confidential Inference Systems: Design Principles and Security Risks. Technical Report,
[5]

https://assets.anthropic.com/m/c52125297b85a42/original/Confidential_ Inference_Paper.pdf
[6]

R.Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Schnabl, C., Hugenroth, D., Marino, B., and Beresford, A. R.Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments. arXiv preprint arXiv:2506.23706, 2025.https://arxiv.org/abs/ 2506.23706

work page arXiv 2025
[7]

arXiv preprint arXiv:2502.18535, 2025

Peng, Z., Zhao, C., Wang, T., Liao, G., Lin, Z., Liu, Y ., Cao, B., Shi, L., Yang, Q., and Zhang, S.A Survey of Zero-Knowledge Proof Based Verifiable Machine Learning. arXiv preprint arXiv:2502.18535, 2025. https: //arxiv.org/abs/2502.18535

work page arXiv 2025
[8]

and Thelwall, M.Can Large Language Models Evaluate Grant Proposal Quality? Revisiting the Wenner˚as and Wold Peer Review Data

Sandstr¨om, U. and Thelwall, M.Can Large Language Models Evaluate Grant Proposal Quality? Revisiting the Wenner˚as and Wold Peer Review Data. arXiv preprint arXiv:2603.14565, 2026.https://arxiv.org/abs/ 2603.14565 Making AI-Assisted Grant Evaluation Auditable without Exposing the Model12

work page arXiv 2026
[9]

OECD Public Governance Policy Papers No

OECD.Governing with Artificial Intelligence. OECD Public Governance Policy Papers No. 22, 2022. https: //www.oecd.org/en/publications/governing-with-artificial-intelligence_ 795de142-en.html

2022
[10]

Open Government Partnership; Ada Lovelace Institute; AI Now Institute.Algorithmic Accountability for the Public Sector: Executive Summary. 2021. https://www.opengovpartnership.org/wp-content/ uploads/2021/08/executive-summary-algorithmic-accountability.pdf

2021
[11]

White Paper, 2021

Confidential Computing Consortium.Confidential Computing: Hardware-Based Trusted Execution for Applications and Data. White Paper, 2021. https://confidentialcomputing.io/resources/ white-papers-reports/

2021
[12]

ACM Computing Surveys, 2024.https://dl.acm.org/doi/full/10.1145/3670007

Mo, F., Tarkhani, Z., and Haddadi, H.Machine Learning with Confidential Computing: A Systematization of Knowledge. ACM Computing Surveys, 2024.https://dl.acm.org/doi/full/10.1145/3670007

work page doi:10.1145/3670007 2024
[13]

Extracting training data from large language models

Carlini, N. et al.Extracting Training Data from Large Language Models. Proceedings of the 30th USENIX Security Symposium, 2021.https://arxiv.org/abs/2012.07805

work page arXiv 2021
[14]

Ignore Previous Prompt: Attack Techniques For Language Models

Perez, F. and Ribeiro, I.Ignore Previous Prompt: Attack Techniques for Language Models. arXiv preprint arXiv:2211.09527, 2022.https://arxiv.org/abs/2211.09527

work page internal anchor Pith review arXiv 2022
[15]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Greshake, K. et al.Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. arXiv preprint arXiv:2302.12173, 2023.https://arxiv.org/abs/2302. 12173

work page internal anchor Pith review arXiv 2023
[16]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Wallace, E. et al.The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. arXiv preprint arXiv:2404.13208, 2024.https://arxiv.org/abs/2404.13208

work page internal anchor Pith review arXiv 2024