Evidence Markets

Chengqi Zang; Gabriel Andrade; Safwan Hossain; Yiling Chen

arxiv: 2606.07434 · v1 · pith:BB26O5FRnew · submitted 2026-06-05 · 💻 cs.GT

Evidence Markets

Safwan Hossain , Gabriel Andrade , Chengqi Zang , Yiling Chen This is my paper

Pith reviewed 2026-06-27 20:11 UTC · model grok-4.3

classification 💻 cs.GT

keywords evidence marketsprediction marketslogarithmic market scoring ruleincentive compatibilityendogenous resolutionautomated market makerLLM evaluationmarket liquidity

0 comments

The pith

Evidence markets bound platform loss and make truthful evidence reporting an approximately dominant strategy through dynamic liquidity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces evidence markets to overcome two limits of standard prediction markets: they capture only beliefs without the supporting evidence, and they require an external ground truth for resolution. By adjusting the liquidity parameter of a logarithmic market scoring rule upward as higher-quality evidence accumulates, the market rewards submissions in proportion to how much they reduce uncertainty. The authors prove this design keeps platform loss bounded and admits an automated market maker implementation. When resolution occurs endogenously from the submitted evidence, truthful reporting of both beliefs and evidence remains an ε-dominant strategy incentive compatible outcome. The approach is illustrated throughout with the running example of crowd-sourced evaluation to determine which large language model performs best on a given task.

Core claim

Evidence markets generalize prediction markets by incorporating evidence submission alongside beliefs. The market employs a logarithmic market scoring rule with a liquidity parameter that increases with the quality of accumulated evidence. This setup ensures bounded platform loss, rewards evidence proportionally to current market uncertainty, and supports implementation via an automated market maker. When the market resolves endogenously based on the submitted evidence, truthful belief and evidence reporting is shown to be an ε-dominant strategy incentive compatible strategy. An LLM-as-a-Judge framework with staking is proposed for evidence verification, along with an asynchronous execution

What carries the argument

The dynamic liquidity parameter in the logarithmic market scoring rule, adjusted according to accumulated evidence quality to reflect reduced uncertainty.

Load-bearing premise

Evidence quality can be assessed reliably by the LLM-as-a-Judge with staking without systematic bias or creating new incentive problems.

What would settle it

A scenario in which traders submit misleading evidence that receives high quality scores from the judge, causing the market to resolve incorrectly and resulting in platform losses exceeding the bound.

read the original abstract

Modern prediction markets face two limitations that restrict their applicability in a range of settings:~(i)~they reveal what the crowd believes but not the evidence or reasoning behind those beliefs, and~(ii)~they require an event with an external ground truth that resolves at a known future date. We address these twin challenges by introducing evidence markets, a generalization of prediction markets that incentivizes the submission of evidence alongside beliefs and can be endogenously resolved using the crowd-sourced evidence if external resolution is not possible. At its core, the market uses a logarithmic market scoring rule whose liquidity parameter changes dynamically with the accumulated evidence quality. We prove that platform loss is bounded, evidence is rewarded proportional to the current market uncertainty, and can be equivalently implemented through an automated market maker. In the case where the marker resolves endogenously based on submitted evidence, we characterize how withholding evidence shifts a trader's belief about resolution and use it to prove truthful belief and evidence reporting is a always an $\varepsilon$-dominant strategy incentive compatible (DSIC) strategy. To address operational considerations, we propose evidence verification via an LLM-as-a-Judge framework with staking and give an asynchronous execution algorithm that is not bottle-necked by verification. Throughout the work, we use LLM evaluations -- determining which model is best for a given task -- as a salient and representative running example for our proposed market.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Evidence markets add dynamic liquidity and endogenous resolution to prediction markets, but the ε-DSIC claim depends on an unverified LLM judge preserving the exact belief-shift model used in the proof.

read the letter

The paper's core move is to let traders report evidence alongside beliefs, tie the market's liquidity parameter to accumulated evidence quality, and allow the market to resolve internally from the submitted evidence when no external ground truth exists. This directly targets two standard prediction-market limits: they show beliefs but not the backing evidence, and they need an external resolution date.

What stands out as new is the specific combination of quality-adjusted liquidity with endogenous resolution, plus the characterization of how withholding evidence changes a trader's view of the resolution probability. The authors claim this yields bounded platform loss, evidence rewards scaled to current uncertainty, and an ε-DSIC truthful strategy. They also sketch an LLM-as-Judge with staking plus an asynchronous execution path.

The practical framing around LLM model evaluation is a reasonable running example. The mechanism design elements look like a clean generalization rather than a minor tweak.

The main soft spot is that the incentive result rests on the evidence-quality signal being reliable enough to keep the belief-update characterization intact. Any consistent bias or extra variance from the LLM judge would alter effective resolution probabilities outside the analyzed model, and the abstract gives no derivation or edge-case check showing the bound survives. Since the proofs are asserted but not visible here, the central formal claims remain uncheckable from the provided text. The staking proposal is mentioned but not stress-tested against judge collusion or low-stake attacks.

This is aimed at mechanism-design people working on information elicitation and at groups building evaluation markets for AI systems. It is distinct enough and formally stated enough to merit a serious referee, even though the verification step will need close scrutiny in review.

Referee Report

2 major / 2 minor

Summary. The paper introduces evidence markets as a generalization of prediction markets that incentivizes submission of evidence alongside beliefs using a logarithmic market scoring rule with liquidity parameter that changes dynamically based on accumulated evidence quality. It claims to prove bounded platform loss, evidence rewarded proportional to market uncertainty, equivalence to an automated market maker implementation, and—for endogenous resolution based on submitted evidence—a characterization of belief shifts from withholding evidence that establishes truthful belief and evidence reporting as an always ε-DSIC strategy. It proposes an LLM-as-a-Judge framework with staking for evidence verification and an asynchronous execution algorithm, using LLM model selection as a running example.

Significance. If the bounded-loss and ε-DSIC results hold with the stated characterizations, the work would meaningfully extend prediction-market mechanisms to settings lacking external ground truth, such as AI capability evaluation, by incorporating evidence submission and endogenous resolution. The dynamic liquidity adjustment and AMM equivalence are concrete technical contributions. The asynchronous algorithm addresses a practical deployment issue. These strengths are noted, but the central incentive result depends on unverified properties of the LLM judge.

major comments (2)

[Abstract] Abstract: the claim that 'truthful belief and evidence reporting is always an ε-DSIC strategy' when the market resolves endogenously rests on a characterization of how withholding evidence shifts a trader's belief about resolution; no derivation, edge-case analysis, or explicit functional form is supplied, so it is impossible to verify whether the dynamic liquidity adjustment (tied to LLM-assessed evidence quality) preserves the belief-update model needed for the ε bound to survive.
[Abstract] Abstract (endogenous-resolution paragraph): the ε-DSIC result requires that the LLM-as-a-Judge supplies unbiased evidence-quality scores sufficient to adjust the liquidity parameter without introducing systematic bias or variance that alters effective resolution probabilities outside the analyzed belief-shift model; no argument or lemma is given showing that the judge preserves the exact functional form used to bound deviation gains.

minor comments (2)

[Abstract] Abstract contains two typographical errors: 'the marker resolves' should read 'the market resolves' and 'is a always an' should read 'is always an'.
The precise definition of ε in the ε-DSIC notion (and how it scales with market parameters) should be stated explicitly, as it is load-bearing for the incentive claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and will make the requested clarifications and additions in a revised version.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'truthful belief and evidence reporting is always an ε-DSIC strategy' when the market resolves endogenously rests on a characterization of how withholding evidence shifts a trader's belief about resolution; no derivation, edge-case analysis, or explicit functional form is supplied, so it is impossible to verify whether the dynamic liquidity adjustment (tied to LLM-assessed evidence quality) preserves the belief-update model needed for the ε bound to survive.

Authors: The manuscript states the characterization of belief shifts from withholding evidence and the resulting ε-DSIC proof in Section 4. We agree, however, that the abstract provides only a high-level summary without the explicit functional form, edge-case analysis, or separate verification that dynamic liquidity updates preserve the belief-update model. In revision we will move the full derivation and functional form into the main text (or a dedicated appendix subsection), add edge-case analysis, and include a new lemma establishing that liquidity-parameter updates based on accumulated evidence quality maintain the required properties for the ε bound to hold. revision: yes
Referee: [Abstract] Abstract (endogenous-resolution paragraph): the ε-DSIC result requires that the LLM-as-a-Judge supplies unbiased evidence-quality scores sufficient to adjust the liquidity parameter without introducing systematic bias or variance that alters effective resolution probabilities outside the analyzed belief-shift model; no argument or lemma is given showing that the judge preserves the exact functional form used to bound deviation gains.

Authors: The ε-DSIC theorem is derived under the modeling assumption that evidence-quality scores are supplied to the liquidity rule; the LLM-as-a-Judge framework with staking is offered only as a practical implementation for obtaining those scores. We acknowledge that no separate lemma or argument is supplied showing that an LLM judge necessarily preserves the exact functional form. In revision we will add an explicit discussion of the modeling assumptions on the judge, the role of staking in discouraging systematic bias, and a clear statement of the conditions under which the ε bound continues to apply; we will also note the possibility of unmodeled variance as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on explicit market rules and belief characterizations without reduction to inputs

full rationale

The abstract and description present a market design using logarithmic scoring with dynamic liquidity, followed by proofs of bounded loss and an ε-DSIC result obtained by characterizing belief shifts from evidence withholding. No equations or steps are shown that define a quantity in terms of itself, rename a fit as a prediction, or rest the central claim on a self-citation chain. The LLM-judge component is an operational proposal rather than a load-bearing premise inside the incentive proof. The derivation chain is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate free parameters, axioms, or invented entities; the dynamic liquidity parameter is referenced but not quantified or derived.

pith-pipeline@v0.9.1-grok · 5766 in / 1015 out tokens · 23966 ms · 2026-06-27T20:11:05.359482+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 2 linked inside Pith

[1]

Proceedings of the 26th ACM Conference on Economics and Computation , pages=

Self-Resolving Prediction Markets for Unverifiable Outcomes , author=. Proceedings of the 26th ACM Conference on Economics and Computation , pages=
[2]

Information Systems Frontiers , volume=

Combinatorial Information Market Design , author=. Information Systems Frontiers , volume=
[3]

Journal of Prediction Markets , volume=

Logarithmic Market Scoring Rules for Modular Combinatorial Information Aggregation , author=. Journal of Prediction Markets , volume=
[4]

Journal of Economic Perspectives , volume=

Prediction Markets , author=. Journal of Economic Perspectives , volume=
[5]

Economics Letters , volume=

Interpreting the Predictions of Prediction Markets , author=. Economics Letters , volume=
[6]

Science , volume=

The Promise of Prediction Markets , author=. Science , volume=
[7]

Proceedings of the 26th ACM Conference on Economics and Computation (EC) , pages=

Self-Resolving Prediction Markets for Unverifiable Outcomes , author=. Proceedings of the 26th ACM Conference on Economics and Computation (EC) , pages=. 2025 , note=

2025
[8]

arXiv preprint arXiv:2502.13410 , year=

Tell Me Why: Incentivizing Explanations , author=. arXiv preprint arXiv:2502.13410 , year=

arXiv
[9]

Srinivasan, Siddarth and Lin, Tao and Murphy, Connacher and Thilagar, Anish and Chen, Yiling and Karger, Ezra , journal=
[10]

arXiv preprint arXiv:2109.00923 , year=

Auctions and Prediction Markets for Scientific Peer Review , author=. arXiv preprint arXiv:2109.00923 , year=

arXiv
[11]

Management Science , volume=

Eliciting Informative Feedback: The Peer-Prediction Method , author=. Management Science , volume=
[12]

Prelec, Dra. A. Science , volume=
[13]

Proceedings of the 22nd International Conference on World Wide Web (WWW) , pages=

Crowdsourced Judgement Elicitation with Endogenous Proficiency , author=. Proceedings of the 22nd International Conference on World Wide Web (WWW) , pages=
[14]

ACM Transactions on Economics and Computation , volume=

An Information Theoretic Framework for Designing Information Elicitation Mechanisms that Reward Truth-Telling , author=. ACM Transactions on Economics and Computation , volume=
[15]

Proceedings of the 12th Innovations in Theoretical Computer Science Conference (ITCS) , year=

Learning and Strongly Truthful Multi-Task Peer Prediction: A Variational Approach , author=. Proceedings of the 12th Innovations in Theoretical Computer Science Conference (ITCS) , year=
[16]

Proceedings of the 17th ACM Conference on Economics and Computation (EC) , pages=

Informed Truthfulness in Multi-Task Peer Prediction , author=. Proceedings of the 17th ACM Conference on Economics and Computation (EC) , pages=
[17]

Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

Dominantly Truthful Multi-Task Peer Prediction with a Constant Number of Tasks , author=. Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=
[18]

Journal of Economic Theory , volume=

Information, Trade and Common Knowledge , author=. Journal of Economic Theory , volume=
[19]

arXiv preprint arXiv:2211.09110 , year=

Holistic Evaluation of Language Models , author=. arXiv preprint arXiv:2211.09110 , year=

Pith/arXiv arXiv
[20]

Transactions on Machine Learning Research , year=

Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , author=. Transactions on Machine Learning Research , year=
[21]

and Li, Tianle and Li, Dacheng and Zhang, Hao and Zhu, Banghua and Jordan, Michael and Gonzalez, Joseph E

Chiang, Wei-Lin and Zheng, Lianmin and Sheng, Ying and Angelopoulos, Anastasios N. and Li, Tianle and Li, Dacheng and Zhang, Hao and Zhu, Banghua and Jordan, Michael and Gonzalez, Joseph E. and Stoica, Ion , booktitle=. Chatbot Arena: An Open Platform for Evaluating
[22]

and Zhang, Hao and Gonzalez, Joseph E

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , booktitle=. Judging
[23]

23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 , pages=

A utility framework for bounded-loss market makers , author=. 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 , pages=

2007
[24]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Crowdsourced outcome determination in prediction markets , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[25]

International Conference on Learning Representations (ICLR) , year=

Let's Verify Step by Step , author=. International Conference on Learning Representations (ICLR) , year=
[26]

arXiv preprint arXiv:1710.09437 , year=

Casper the Friendly Finality Gadget , author=. arXiv preprint arXiv:1710.09437 , year=

Pith/arXiv arXiv
[27]

27th USENIX Security Symposium , pages=

Arbitrum: Scalable, private smart contracts , author=. 27th USENIX Security Symposium , pages=
[28]

arXiv preprint arXiv:2408.15240 , year=

Generative Verifiers: Reward Modeling as Next-Token Prediction , author=. arXiv preprint arXiv:2408.15240 , year=

arXiv
[29]

Khalifa, Muhammad and others , journal=

[1] [1]

Proceedings of the 26th ACM Conference on Economics and Computation , pages=

Self-Resolving Prediction Markets for Unverifiable Outcomes , author=. Proceedings of the 26th ACM Conference on Economics and Computation , pages=

[2] [2]

Information Systems Frontiers , volume=

Combinatorial Information Market Design , author=. Information Systems Frontiers , volume=

[3] [3]

Journal of Prediction Markets , volume=

Logarithmic Market Scoring Rules for Modular Combinatorial Information Aggregation , author=. Journal of Prediction Markets , volume=

[4] [4]

Journal of Economic Perspectives , volume=

Prediction Markets , author=. Journal of Economic Perspectives , volume=

[5] [5]

Economics Letters , volume=

Interpreting the Predictions of Prediction Markets , author=. Economics Letters , volume=

[6] [6]

Science , volume=

The Promise of Prediction Markets , author=. Science , volume=

[7] [7]

Proceedings of the 26th ACM Conference on Economics and Computation (EC) , pages=

Self-Resolving Prediction Markets for Unverifiable Outcomes , author=. Proceedings of the 26th ACM Conference on Economics and Computation (EC) , pages=. 2025 , note=

2025

[8] [8]

arXiv preprint arXiv:2502.13410 , year=

Tell Me Why: Incentivizing Explanations , author=. arXiv preprint arXiv:2502.13410 , year=

arXiv

[9] [9]

Srinivasan, Siddarth and Lin, Tao and Murphy, Connacher and Thilagar, Anish and Chen, Yiling and Karger, Ezra , journal=

[10] [10]

arXiv preprint arXiv:2109.00923 , year=

Auctions and Prediction Markets for Scientific Peer Review , author=. arXiv preprint arXiv:2109.00923 , year=

arXiv

[11] [11]

Management Science , volume=

Eliciting Informative Feedback: The Peer-Prediction Method , author=. Management Science , volume=

[12] [12]

Prelec, Dra. A. Science , volume=

[13] [13]

Proceedings of the 22nd International Conference on World Wide Web (WWW) , pages=

Crowdsourced Judgement Elicitation with Endogenous Proficiency , author=. Proceedings of the 22nd International Conference on World Wide Web (WWW) , pages=

[14] [14]

ACM Transactions on Economics and Computation , volume=

An Information Theoretic Framework for Designing Information Elicitation Mechanisms that Reward Truth-Telling , author=. ACM Transactions on Economics and Computation , volume=

[15] [15]

Proceedings of the 12th Innovations in Theoretical Computer Science Conference (ITCS) , year=

Learning and Strongly Truthful Multi-Task Peer Prediction: A Variational Approach , author=. Proceedings of the 12th Innovations in Theoretical Computer Science Conference (ITCS) , year=

[16] [16]

Proceedings of the 17th ACM Conference on Economics and Computation (EC) , pages=

Informed Truthfulness in Multi-Task Peer Prediction , author=. Proceedings of the 17th ACM Conference on Economics and Computation (EC) , pages=

[17] [17]

Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

Dominantly Truthful Multi-Task Peer Prediction with a Constant Number of Tasks , author=. Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

[18] [18]

Journal of Economic Theory , volume=

Information, Trade and Common Knowledge , author=. Journal of Economic Theory , volume=

[19] [19]

arXiv preprint arXiv:2211.09110 , year=

Holistic Evaluation of Language Models , author=. arXiv preprint arXiv:2211.09110 , year=

Pith/arXiv arXiv

[20] [20]

Transactions on Machine Learning Research , year=

Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models , author=. Transactions on Machine Learning Research , year=

[21] [21]

and Li, Tianle and Li, Dacheng and Zhang, Hao and Zhu, Banghua and Jordan, Michael and Gonzalez, Joseph E

Chiang, Wei-Lin and Zheng, Lianmin and Sheng, Ying and Angelopoulos, Anastasios N. and Li, Tianle and Li, Dacheng and Zhang, Hao and Zhu, Banghua and Jordan, Michael and Gonzalez, Joseph E. and Stoica, Ion , booktitle=. Chatbot Arena: An Open Platform for Evaluating

[22] [22]

and Zhang, Hao and Gonzalez, Joseph E

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , booktitle=. Judging

[23] [23]

23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 , pages=

A utility framework for bounded-loss market makers , author=. 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 , pages=

2007

[24] [24]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Crowdsourced outcome determination in prediction markets , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[25] [25]

International Conference on Learning Representations (ICLR) , year=

Let's Verify Step by Step , author=. International Conference on Learning Representations (ICLR) , year=

[26] [26]

arXiv preprint arXiv:1710.09437 , year=

Casper the Friendly Finality Gadget , author=. arXiv preprint arXiv:1710.09437 , year=

Pith/arXiv arXiv

[27] [27]

27th USENIX Security Symposium , pages=

Arbitrum: Scalable, private smart contracts , author=. 27th USENIX Security Symposium , pages=

[28] [28]

arXiv preprint arXiv:2408.15240 , year=

Generative Verifiers: Reward Modeling as Next-Token Prediction , author=. arXiv preprint arXiv:2408.15240 , year=

arXiv

[29] [29]

Khalifa, Muhammad and others , journal=