Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Ravi Kiran Kadaboina

arxiv: 2605.20312 · v1 · pith:4OKSMV4Unew · submitted 2026-05-19 · 💻 cs.CR · cs.LO· cs.MA

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Ravi Kiran Kadaboina This is my paper

Pith reviewed 2026-05-21 02:07 UTC · model grok-4.3

classification 💻 cs.CR cs.LOcs.MA

keywords claim verificationautonomous agentsprotocol designformal verificationTLA+verification artifactsagent networksepistemology

0 comments

The pith

Pramana supplies a wire-format ClaimAttestation that classifies every consequential agent output into one of four types and pairs it with a source-linked verify operation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous agents in regulated domains must emit records that auditors can replay offline to confirm what was claimed, against which source, and by what method. Existing patterns either yield non-reproducible probabilistic judgments or vendor-locked traces that outsiders cannot reconstruct. Pramana supplies the missing standardized structure: every output is enclosed in a ClaimAttestation typed as measurement, inference, analogy, or citation, each carrying its own verify() method. The four-type scheme is taken from classical Indian epistemology and the full lifecycle is expressed in TLA+ then checked exhaustively by TLC. A reference implementation and deployment invariants for reachability, SLA bounds, and offline re-verifiability complete the proposal.

Core claim

Pramana defines the missing wire format. Every consequential agent output is wrapped in a typed ClaimAttestation with one of four variants (measurement, inference, analogy, citation), each paired with a verify() operation against the recorded source. verify() is deterministic for MeasurementClaim and CitationClaim. For InferenceClaim and AnalogyClaim, determinism is conditional on the oracle. The lifecycle is specified in TLA+ and exhaustively verified under TLC across three symmetry-reduced models: 38,563 distinct reachable states, zero invariant violations.

What carries the argument

The ClaimAttestation wrapper that carries one of four claim variants and exposes a verify() operation that re-executes against the recorded source.

If this is right

Agents produce records that external auditors can re-execute without vendor-specific adapters.
Measurement and citation claims become fully deterministic under replay; inference and analogy claims become replayable when an audit oracle is supplied.
Three deployment invariants—reachability, SLA bound, and offline re-verifiability—are guaranteed by the A2A and MCP wire extensions.
The TLA+ model shows that the attestation lifecycle can be exhaustively checked with zero invariant violations in symmetry-reduced state spaces.
An exploratory pilot indicates that LLM-as-judge false-positive rates vary by up to 40 percentage points depending on reference-solution quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized attestations could allow regulators to mandate auditable agent behavior without dictating internal model architectures.
Interoperability across agent frameworks becomes feasible once every consequential output carries a self-describing verification artifact.
The same four-type structure might be applied to non-LLM agents or hybrid human-AI decision pipelines that must also produce verifiable records.
Extending the typology with domain-specific claim subtypes could be tested by adding new variants to the TLA+ model and re-running the TLC checks.

Load-bearing premise

The four-way typology drawn from classical Indian epistemology is sufficient to classify and support verification of every consequential output that autonomous agents will produce in regulated domains.

What would settle it

A concrete consequential agent output that cannot be assigned to any of the four claim types while still preserving a complete, replayable audit trail.

Figures

Figures reproduced from arXiv: 2605.20312 by Ravi Kiran Kadaboina.

**Figure 2.** Figure 2: Figure 2: Detection vs FPR by ensemble structure and aggregation rule, with 95% [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Figure 3: McNemar discordant-cell counts on the clean slice for the two significant pair [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Figure 4: Detection rate per adversarial transformation, same-family majority. The [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Autonomous agents deployed in regulated domains must produce a verification artifact per consequential output: a record an auditor can re-execute offline, capturing what was claimed, against what source, by whom, when, and how. Production verification today splits into two unstandardized halves. Probabilistic verdict patterns (self-consistency voting, reviewer LLM ensembles) produce judgments, not artifacts. Artifact-producing patterns (RAG, tool-augmented traces, generator-verifier loops) produce vendor-specific records no external auditor can reconstruct without bespoke integration. Pramana defines the missing wire format. Every consequential agent output is wrapped in a typed ClaimAttestation with one of four variants (measurement, inference, analogy, citation), each paired with a verify() operation against the recorded source. verify() is deterministic for MeasurementClaim and CitationClaim. For InferenceClaim and AnalogyClaim, determinism is conditional on the oracle (audit-replayable when LLM-backed). The four-way typology derives from classical Indian epistemology (pramana, valid means of knowledge). The lifecycle is specified in TLA+ and exhaustively verified under TLC across three symmetry-reduced models: 38,563 distinct reachable states, zero invariant violations. The Python reference implementation passes 84 tests. An A2A and MCP wire-extension manifest layers three deployment-grade invariants: reachability, SLA bound, and offline re-verifiability. An exploratory pilot (n=100, 2,275 reviewer calls) probes LLM-as-judge in code generation. The strongest observation is a 40-percentage-point raw FPR delta across corpora, consistent with reference-solution quality contributing significantly. The pilot does not validate Pramana on its own; the structural argument and formal verification do that.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pramana gives a concrete ClaimAttestation wire format and TLA+ verified lifecycle for agent outputs, but the four-variant typology is defined rather than shown to be exhaustive.

read the letter

Pramana defines a wire format for wrapping consequential agent outputs in a typed ClaimAttestation that uses one of four variants drawn from classical Indian epistemology: measurement, inference, analogy, and citation. Each comes with a verify operation against the recorded source, and the whole lifecycle is captured in a TLA+ specification that TLC checked across 38,563 states with no invariant violations. The Python reference passes 84 tests and the work adds three deployment invariants for A2A and MCP layers. That is the core contribution. It cleanly separates artifact production from probabilistic verdict methods and aims at offline auditor reconstruction in regulated settings. The formal numbers and test count are real evidence that the state machine behaves as specified, which is more than many protocol sketches deliver. The exploratory pilot is presented honestly as not validating the protocol itself, so the structural claim rests on the spec and invariants. The main soft spot is the assumption that the four categories are jointly sufficient for every consequential output an agent might produce. The abstract and description give no completeness argument, no enumeration of output classes, and no mapping for hybrid, probabilistic, or emergent cases. That leaves the typology more as a definitional choice than a demonstrated fit. The paper is for engineers and researchers working on standards and auditability for autonomous agents in compliance-heavy domains. A reader who needs a starting point for reproducible verification artifacts and formal lifecycle modeling would get usable definitions and invariants from it. I would send this to peer review. The formal verification and concrete wire format give referees something substantive to examine even if the typology coverage needs more work.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Pramana as a protocol for claim verification in autonomous agent networks. It defines a typed ClaimAttestation wrapper for every consequential agent output, using one of four variants (measurement, inference, analogy, citation) drawn from classical Indian epistemology, each paired with a verify() operation against a recorded source. The protocol lifecycle is specified in TLA+ and model-checked with TLC across symmetry-reduced models yielding 38,563 reachable states and zero invariant violations. A Python reference implementation passes 84 tests, and an exploratory pilot (n=100) examines LLM-as-judge behavior in code generation without claiming to validate the protocol.

Significance. If the four-variant typology proves sufficient and appropriate for classifying outputs in regulated domains, Pramana would supply a standardized, auditor-reconstructible wire format that addresses the current split between non-artifact probabilistic methods and vendor-specific records. The machine-checked TLA+ specification with exhaustive TLC verification and the passing Python test suite constitute clear strengths supporting the lifecycle state machine.

major comments (2)

[Abstract] Abstract: The claim that the four variants (measurement, inference, analogy, citation) are jointly exhaustive and appropriate for all consequential outputs produced by autonomous agents in regulated domains is asserted without a completeness argument, enumeration of output classes, or mapping showing why hybrid, probabilistic, or emergent forms fall inside these buckets. This assumption is load-bearing for the assertion that Pramana defines the missing standardized wire format.
[TLA+ model and verification paragraph] TLA+ model and verification paragraph: The exhaustive TLC check (38,563 states, zero invariant violations) establishes correctness of the lifecycle state machine but does not address the semantic coverage or adequacy of the ClaimAttestation type system itself; the paper should clarify this scope limitation since the typology's sufficiency is central to applicability.

minor comments (1)

The abstract and pilot description appropriately note that the n=100 study does not validate Pramana; however, the presentation of the 40-percentage-point FPR delta could more explicitly separate it from the core formal argument to avoid any implication of empirical support.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and precise comments. We address each major comment below and indicate the revisions we intend to incorporate.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the four variants (measurement, inference, analogy, citation) are jointly exhaustive and appropriate for all consequential outputs produced by autonomous agents in regulated domains is asserted without a completeness argument, enumeration of output classes, or mapping showing why hybrid, probabilistic, or emergent forms fall inside these buckets. This assumption is load-bearing for the assertion that Pramana defines the missing standardized wire format.

Authors: The manuscript presents the four variants as the standard categories drawn from classical Indian epistemology (pramana) rather than as a newly proven exhaustive partition of every conceivable agent output. We accept that the abstract does not supply an explicit completeness argument or mapping for hybrids and emergent cases. In the revised version we will qualify the relevant sentence to read that Pramana adopts these four categories as a practical, historically grounded typology for structuring attestations, and we will add a short discussion paragraph noting that hybrid outputs may be represented by multiple attestations or by selecting the dominant variant while acknowledging that future work may be needed for certain emergent forms. revision: yes
Referee: [TLA+ model and verification paragraph] TLA+ model and verification paragraph: The exhaustive TLC check (38,563 states, zero invariant violations) establishes correctness of the lifecycle state machine but does not address the semantic coverage or adequacy of the ClaimAttestation type system itself; the paper should clarify this scope limitation since the typology's sufficiency is central to applicability.

Authors: The referee is correct. The TLA+ specification and TLC runs verify the operational lifecycle (state transitions, reachability, and the three deployment invariants) but do not encode or check the semantic adequacy of the four claim variants. We will revise the paragraph to state explicitly that the model-checked results confirm correctness of the protocol state machine while the appropriateness of the typology for regulated domains rests on the epistemological grounding and the reference implementation, not on the model checker. revision: yes

Circularity Check

0 steps flagged

No significant circularity; protocol definition and TLA+ verification are self-contained

full rationale

The paper presents Pramana as an explicit definition of a wire format and ClaimAttestation typology, with the four categories adopted from classical Indian epistemology as framing rather than derived via equations or prior author results. The central technical contribution is the TLA+ lifecycle specification, which is exhaustively model-checked under TLC (38,563 states, zero violations) and supported by a Python implementation with 84 tests. No load-bearing step reduces by construction to fitted inputs, self-citations, or ansatzes from the authors' own prior work; the verification is independent and externally reproducible. The sufficiency of the typology for all agent outputs is an assumption, but this is a completeness question, not a circular reduction in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that the classical four-type epistemology maps cleanly onto modern agent outputs and that the protocol invariants suffice for regulated use; no numerical free parameters are fitted because this is a specification rather than an empirical model.

axioms (1)

domain assumption The four claim types (measurement, inference, analogy, citation) derived from classical Indian epistemology cover all consequential outputs of autonomous agents.
Invoked when defining the ClaimAttestation variants in the abstract.

invented entities (1)

ClaimAttestation no independent evidence
purpose: Typed wrapper record that captures what was claimed, against what source, by whom, when, and how for offline audit.
Newly defined data structure that forms the core of the wire format.

pith-pipeline@v0.9.0 · 5852 in / 1541 out tokens · 45176 ms · 2026-05-21T02:07:06.861169+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 11 internal anchors

[1]

a2a-protocol.org

Agent2Agent (A2A) Protocol Specification. a2a-protocol.org

work page
[2]

modelcontextprotocol.io

Model Context Protocol (MCP) Specification. modelcontextprotocol.io

work page
[3]

(2023, September 19)

Consumer Financial Protection Bureau. (2023, September 19). Adverse Action Notification Requirements and the Proper Use of the CFPB’s Sample Forms Provided in Regulation B. Circular 2023-03

work page 2023
[4]

(2011, April 4)

Board of Governors of the Federal Reserve System & Oﬀice of the Comptroller of the Currency. (2011, April 4). Supervisory Guidance on Model Risk Management. SR 11-7 / OCC Bulletin 2011-12

work page 2011
[5]

(2024, July 11)

New York State Department of Financial Services. (2024, July 11). Use of Artificial Intel- ligence Systems and External Consumer Data and Information Sources in Insurance Under- writing and Pricing. Insurance Circular Letter No. 7 (2024)

work page 2024
[6]

(2023, September 21; effective November 14, 2023; expanded October 15, 2025)

Colorado Division of Insurance. (2023, September 21; effective November 14, 2023; expanded October 15, 2025). Governance and Risk Management Framework Requirements for Insurers’ Use of External Consumer Data and Information Sources, Algorithms, and Predictive Models. Regulation 10-1-1

work page 2023
[7]

Department of Health and Human Services, Oﬀice for Civil Rights, & Centers for Medi- care & Medicaid Services

U.S. Department of Health and Human Services, Oﬀice for Civil Rights, & Centers for Medi- care & Medicaid Services. (2024, May 6). Nondiscrimination in Health Programs and Activ- ities, Final Rule. 89 Fed. Reg. 28822 (codified at 45 C.F.R. pt. 92; §92.210 governs patient care decision support tools)

work page 2024
[8]

Effective August 2, 2026

EU Artificial Intelligence Act (Regulation (EU) 2024/1689), Articles 14, 50. Effective August 2, 2026

work page 2024
[9]

General Data Protection Regulation (GDPR), Recital 71

work page
[10]

Kadaboina, R. K. (2026). Anumati: Proof of Adherence as a Formal Consent Model for Autonomous Agent Protocols. arXiv:2604.16524

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

Kadaboina, R. K. (2026). Yathartha: A Protocol-Layer Treatment of Jagged Intelligence in Autonomous Agent Networks. Zenodo DOI 10.5281/zenodo.19659633

work page doi:10.5281/zenodo.19659633 2026
[12]

Kadaboina, R. K. (2026). Phala: Principal-Declared Welfare Feedback for Autonomous Agent Networks. Zenodo DOI 10.5281/zenodo.19625612

work page doi:10.5281/zenodo.19625612 2026
[13]

Kadaboina, R. K. (2026). Pratyahara: A Neural Tissue Defense Model for Detecting Compromised Agents in Multi-Agent Networks. Specification name: NER VE. Zenodo DOI 10.5281/zenodo.19628589

work page doi:10.5281/zenodo.19628589 2026
[14]

Kadaboina, R. K. (2026). Sauvidya: An Accessibility Protocol for Agent-to-Principal Interac- tion in Autonomous Agent Networks. Specification name: PACE. Zenodo DOI 10.5281/zen- odo.19633139

work page doi:10.5281/zen- 2026
[15]

Austin, J., et al. (2021). Program Synthesis with Large Language Models. arXiv:2108.07732. (MBPP dataset.)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[16]

Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. (HumanEval dataset.)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

Kim, E., Garg, A., Peng, K., & Garg, N. (2025). Correlated Errors in Large Language Models. arXiv:2506.07962

work page arXiv 2025
[18]

Li, D., et al. (2025). Preference Leakage: A Contamination Problem in LLM-as-a-Judge. 21 arXiv:2502.01534

work page arXiv 2025
[19]

Zheng, L., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

R., & Feng, S

Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM Evaluators Recognize and Favor Their Own Generations. arXiv:2404.13076

work page arXiv 2024
[21]

Wang, P., et al. (2023). Large Language Models Are Not Fair Evaluators. arXiv:2305.17926

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Stureborg, R., Alikaniotis, D., & Suhara, Y. (2024). Large Language Models Are Inconsistent and Biased Evaluators. arXiv:2405.01724

work page arXiv 2024
[23]

Maloyan, N., Ashinov, B., & Namiot, D. (2025). Investigating the Vulnerability of LLM-as- a-Judge Architectures to Prompt-Injection Attacks. arXiv:2505.13348

work page arXiv 2025
[24]

Nasr, M., et al. (2025). The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections. arXiv:2510.09023

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Cemri, M., et al. (2025). Why Do Multi-Agent LLM Systems Fail? (MAST). arXiv:2503.13657

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Arafat, J. (2025). Citation-Grounded Code Comprehension. arXiv:2512.12117

work page arXiv 2025
[27]

Onweller, H., et al. (2026). Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents. arXiv:2605.06635

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Manakul, P., Liusie, A., & Gales, M. J. F. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv:2303.08896

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Yang, Y., et al. (2025). A Survey of AI Agent Protocols. arXiv:2504.16736

work page arXiv 2025
[30]

Romera-Paredes, B., et al. (2024). FunSearch: Mathematical Discoveries from Program Search with Large Language Models. Nature 625, 468-475

work page 2024
[31]

Novikov, A., et al. (2025). AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery. arXiv:2506.13131

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K

Dell’Acqua, F., McFowland III, E., Mollick, E., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper No. 24-...

work page 2023
[33]

Moreau, L., & Missier, P. (eds.). (2013, April 30). PROV-DM: The PROV Data Model. W3C Recommendation. https://www.w3.org/TR/prov-dm/

work page 2013
[34]

Lebo, T., Sahoo, S., & McGuinness, D. (eds.). (2013, April 30). PROV-O: The PROV Ontology. W3C Recommendation. https://www.w3.org/TR/prov-o/

work page 2013
[35]

(2021, September)

Fuller, J., Raman, M., Sage-Gavin, E., & Hines, K. (2021, September). Hidden Workers: Untapped Talent. Harvard Business School Project on Managing the Future of Work, in collaboration with Accenture

work page 2021
[36]

New York City Department of Consumer and Worker Protection. (2023). Automated Em- ployment Decision Tools: Final Rule. 6 RCNY § 5-300 et seq. Implementing Local Law 144 of 2021; enforcement effective July 5, 2023

work page 2023
[37]

iTutorGroup, Inc., No

EEOC v. iTutorGroup, Inc., No. 1:22-cv-02565 (E.D.N.Y.). Consent decree filed August 9, 2023; settlement $365,000

work page 2023
[38]

Workday, Inc., No

Mobley v. Workday, Inc., No. 3:23-cv-00770 (N.D. Cal.). Order on motion to dismiss, July 12, 2024 (vendor liability as “agent” under Title VII, ADEA, ADA); collective action conditionally certified May 16, 2025 (ADEA claims)

work page 2024
[39]

W3C. (2025). Verifiable Credentials Data Model v2.0. W3C Recommendation. https://ww w.w3.org/TR/vc-data-model-2.0/

work page 2025
[40]

(2021, July)

Mökander, J., Morley, J., Taddeo, M., & Floridi, L. (2021, July). Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. Science and Engineer- 22 ing Ethics, 27(4). DOI 10.1007/s11948-021-00319-4

work page doi:10.1007/s11948-021-00319-4 2021
[41]

A., Huey, J., Barocas, S., Felten, E

Kroll, J. A., Huey, J., Barocas, S., Felten, E. W., Reidenberg, J. R., Robinson, D. G., & Yu, H. (2017). Accountable Algorithms. University of Pennsylvania Law Review , 165(3), 633-705

work page 2017
[42]

Gottweis, J., et al. (2025). Towards an AI co-scientist. arXiv:2502.18864. Appendix: artifacts and replication The Pramana reference implementation, TLA+ specifications, A2A and MCP discovery mani- fests, claim-attestation wire extension, empirical pilot scripts and raw API responses, the PRE- REGISTRATION.md artifact, and per-case adjudication files are ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

a2a-protocol.org

Agent2Agent (A2A) Protocol Specification. a2a-protocol.org

work page

[2] [2]

modelcontextprotocol.io

Model Context Protocol (MCP) Specification. modelcontextprotocol.io

work page

[3] [3]

(2023, September 19)

Consumer Financial Protection Bureau. (2023, September 19). Adverse Action Notification Requirements and the Proper Use of the CFPB’s Sample Forms Provided in Regulation B. Circular 2023-03

work page 2023

[4] [4]

(2011, April 4)

Board of Governors of the Federal Reserve System & Oﬀice of the Comptroller of the Currency. (2011, April 4). Supervisory Guidance on Model Risk Management. SR 11-7 / OCC Bulletin 2011-12

work page 2011

[5] [5]

(2024, July 11)

New York State Department of Financial Services. (2024, July 11). Use of Artificial Intel- ligence Systems and External Consumer Data and Information Sources in Insurance Under- writing and Pricing. Insurance Circular Letter No. 7 (2024)

work page 2024

[6] [6]

(2023, September 21; effective November 14, 2023; expanded October 15, 2025)

Colorado Division of Insurance. (2023, September 21; effective November 14, 2023; expanded October 15, 2025). Governance and Risk Management Framework Requirements for Insurers’ Use of External Consumer Data and Information Sources, Algorithms, and Predictive Models. Regulation 10-1-1

work page 2023

[7] [7]

Department of Health and Human Services, Oﬀice for Civil Rights, & Centers for Medi- care & Medicaid Services

U.S. Department of Health and Human Services, Oﬀice for Civil Rights, & Centers for Medi- care & Medicaid Services. (2024, May 6). Nondiscrimination in Health Programs and Activ- ities, Final Rule. 89 Fed. Reg. 28822 (codified at 45 C.F.R. pt. 92; §92.210 governs patient care decision support tools)

work page 2024

[8] [8]

Effective August 2, 2026

EU Artificial Intelligence Act (Regulation (EU) 2024/1689), Articles 14, 50. Effective August 2, 2026

work page 2024

[9] [9]

General Data Protection Regulation (GDPR), Recital 71

work page

[10] [10]

Kadaboina, R. K. (2026). Anumati: Proof of Adherence as a Formal Consent Model for Autonomous Agent Protocols. arXiv:2604.16524

work page internal anchor Pith review Pith/arXiv arXiv 2026

[11] [11]

Kadaboina, R. K. (2026). Yathartha: A Protocol-Layer Treatment of Jagged Intelligence in Autonomous Agent Networks. Zenodo DOI 10.5281/zenodo.19659633

work page doi:10.5281/zenodo.19659633 2026

[12] [12]

Kadaboina, R. K. (2026). Phala: Principal-Declared Welfare Feedback for Autonomous Agent Networks. Zenodo DOI 10.5281/zenodo.19625612

work page doi:10.5281/zenodo.19625612 2026

[13] [13]

Kadaboina, R. K. (2026). Pratyahara: A Neural Tissue Defense Model for Detecting Compromised Agents in Multi-Agent Networks. Specification name: NER VE. Zenodo DOI 10.5281/zenodo.19628589

work page doi:10.5281/zenodo.19628589 2026

[14] [14]

Kadaboina, R. K. (2026). Sauvidya: An Accessibility Protocol for Agent-to-Principal Interac- tion in Autonomous Agent Networks. Specification name: PACE. Zenodo DOI 10.5281/zen- odo.19633139

work page doi:10.5281/zen- 2026

[15] [15]

Austin, J., et al. (2021). Program Synthesis with Large Language Models. arXiv:2108.07732. (MBPP dataset.)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[16] [16]

Chen, M., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. (HumanEval dataset.)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

Kim, E., Garg, A., Peng, K., & Garg, N. (2025). Correlated Errors in Large Language Models. arXiv:2506.07962

work page arXiv 2025

[18] [18]

Li, D., et al. (2025). Preference Leakage: A Contamination Problem in LLM-as-a-Judge. 21 arXiv:2502.01534

work page arXiv 2025

[19] [19]

Zheng, L., et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

R., & Feng, S

Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM Evaluators Recognize and Favor Their Own Generations. arXiv:2404.13076

work page arXiv 2024

[21] [21]

Wang, P., et al. (2023). Large Language Models Are Not Fair Evaluators. arXiv:2305.17926

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

Stureborg, R., Alikaniotis, D., & Suhara, Y. (2024). Large Language Models Are Inconsistent and Biased Evaluators. arXiv:2405.01724

work page arXiv 2024

[23] [23]

Maloyan, N., Ashinov, B., & Namiot, D. (2025). Investigating the Vulnerability of LLM-as- a-Judge Architectures to Prompt-Injection Attacks. arXiv:2505.13348

work page arXiv 2025

[24] [24]

Nasr, M., et al. (2025). The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections. arXiv:2510.09023

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Cemri, M., et al. (2025). Why Do Multi-Agent LLM Systems Fail? (MAST). arXiv:2503.13657

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

Arafat, J. (2025). Citation-Grounded Code Comprehension. arXiv:2512.12117

work page arXiv 2025

[27] [27]

Onweller, H., et al. (2026). Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents. arXiv:2605.06635

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

Manakul, P., Liusie, A., & Gales, M. J. F. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv:2303.08896

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Yang, Y., et al. (2025). A Survey of AI Agent Protocols. arXiv:2504.16736

work page arXiv 2025

[30] [30]

Romera-Paredes, B., et al. (2024). FunSearch: Mathematical Discoveries from Program Search with Large Language Models. Nature 625, 468-475

work page 2024

[31] [31]

Novikov, A., et al. (2025). AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery. arXiv:2506.13131

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K

Dell’Acqua, F., McFowland III, E., Mollick, E., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper No. 24-...

work page 2023

[33] [33]

Moreau, L., & Missier, P. (eds.). (2013, April 30). PROV-DM: The PROV Data Model. W3C Recommendation. https://www.w3.org/TR/prov-dm/

work page 2013

[34] [34]

Lebo, T., Sahoo, S., & McGuinness, D. (eds.). (2013, April 30). PROV-O: The PROV Ontology. W3C Recommendation. https://www.w3.org/TR/prov-o/

work page 2013

[35] [35]

(2021, September)

Fuller, J., Raman, M., Sage-Gavin, E., & Hines, K. (2021, September). Hidden Workers: Untapped Talent. Harvard Business School Project on Managing the Future of Work, in collaboration with Accenture

work page 2021

[36] [36]

New York City Department of Consumer and Worker Protection. (2023). Automated Em- ployment Decision Tools: Final Rule. 6 RCNY § 5-300 et seq. Implementing Local Law 144 of 2021; enforcement effective July 5, 2023

work page 2023

[37] [37]

iTutorGroup, Inc., No

EEOC v. iTutorGroup, Inc., No. 1:22-cv-02565 (E.D.N.Y.). Consent decree filed August 9, 2023; settlement $365,000

work page 2023

[38] [38]

Workday, Inc., No

Mobley v. Workday, Inc., No. 3:23-cv-00770 (N.D. Cal.). Order on motion to dismiss, July 12, 2024 (vendor liability as “agent” under Title VII, ADEA, ADA); collective action conditionally certified May 16, 2025 (ADEA claims)

work page 2024

[39] [39]

W3C. (2025). Verifiable Credentials Data Model v2.0. W3C Recommendation. https://ww w.w3.org/TR/vc-data-model-2.0/

work page 2025

[40] [40]

(2021, July)

Mökander, J., Morley, J., Taddeo, M., & Floridi, L. (2021, July). Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. Science and Engineer- 22 ing Ethics, 27(4). DOI 10.1007/s11948-021-00319-4

work page doi:10.1007/s11948-021-00319-4 2021

[41] [41]

A., Huey, J., Barocas, S., Felten, E

Kroll, J. A., Huey, J., Barocas, S., Felten, E. W., Reidenberg, J. R., Robinson, D. G., & Yu, H. (2017). Accountable Algorithms. University of Pennsylvania Law Review , 165(3), 633-705

work page 2017

[42] [42]

Gottweis, J., et al. (2025). Towards an AI co-scientist. arXiv:2502.18864. Appendix: artifacts and replication The Pramana reference implementation, TLA+ specifications, A2A and MCP discovery mani- fests, claim-attestation wire extension, empirical pilot scripts and raw API responses, the PRE- REGISTRATION.md artifact, and per-case adjudication files are ...

work page internal anchor Pith review Pith/arXiv arXiv 2025