pith. sign in

arxiv: 2606.04217 · v2 · pith:BIAG7H2Lnew · submitted 2026-06-02 · 💻 cs.CE · q-fin.ST· q-fin.TR

Polymarket-v1 Database

Pith reviewed 2026-06-28 07:44 UTC · model grok-4.3

classification 💻 cs.CE q-fin.STq-fin.TR
keywords prediction marketsmicrostructureaggressor classificationVPINBrier scoreon-chain dataGibbs spreadtrade direction
0
0 comments X

The pith

Ground-truth aggressor direction from the blockchain settlement layer shows that true VPIN predicts Brier scores while classified proxies do not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles the full on-chain trade record for Polymarket's first exchange, yielding 1.2 billion records with exact aggressor flags taken directly from settlement rather than inferred. Standard classifiers such as the tick rule and bulk volume method recover only random accuracy because prediction markets exhibit persistent direction autocorrelation and concentrated market-making, violating the mean-reversion premise those tools assume. These label errors distort VPIN and order-flow imbalance enough to erase their statistical links to market calibration. With the accurate labels the authors recover a positive relation between true VPIN and Brier scores and a negative relation between Gibbs spread and Brier scores; both relations weaken sharply when the same metrics are computed from classified data instead.

Core claim

The complete on-chain archive supplies 100 percent ground-truth aggressor direction unavailable in prior prediction-market data sets. Tick-rule and bulk-volume classifiers achieve only 49.83 percent and 50.51 percent aggregate accuracy, with systematic price-level bias arising from positive trade-direction autocorrelation and concentrated market-making. These errors cause inferred VPIN to diverge from ground-truth VPIN and bias OFI estimates. Ground-truth VPIN positively predicts Brier scores while Gibbs spread negatively predicts them, yet the same relationships are materially attenuated when ground-truth metrics are replaced by classified proxies.

What carries the argument

The 100 percent ground-truth aggressor direction extracted from the blockchain settlement layer, used both to benchmark classical classifiers and to validate microstructure metrics against subsequent forecast accuracy.

If this is right

  • Classification errors propagate directly into VPIN and OFI, producing biased transaction-cost estimates.
  • True VPIN rises with worse Brier scores, consistent with informed volume coinciding with poorer calibration.
  • Gibbs spread falls with worse Brier scores, consistent with high-spread markets drawing informed specialists rather than noise traders.
  • Any study that substitutes classified proxies for ground-truth metrics will understate the strength of microstructure-forecast linkages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Prediction-market platforms may need classifiers explicitly adjusted for persistent direction autocorrelation rather than relying on equity-market defaults.
  • The same ground-truth labels could be used to train market-specific classifiers that recover accurate VPIN at scale.
  • On-chain settlement data from other decentralized prediction or betting venues would allow direct tests of whether the same classification failures appear outside Polymarket.

Load-bearing premise

The on-chain settlement layer supplies 100 percent accurate aggressor direction for every trade record without extraction errors or ambiguities.

What would settle it

Re-estimating the VPIN-Brier and Gibbs-Brier regressions on the same markets after replacing ground-truth labels with tick-rule labels at the observed error rate and finding that the slope coefficients remain statistically indistinguishable from the ground-truth results.

Figures

Figures reproduced from arXiv: 2606.04217 by Boka Qin, Rui Yang.

Figure 8
Figure 8. Figure 8: 0.00-0.01 0.05-0.06 0.10-0.11 0.15-0.16 0.20-0.21 0.25-0.26 0.30-0.31 0.35-0.36 0.40-0.41 0.45-0.46 0.50-0.51 0.55-0.56 0.60-0.61 0.65-0.66 0.70-0.71 0.75-0.76 0.80-0.81 0.85-0.86 0.90-0.91 0.95-0.96 Price Bin −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 Direction Autocorrelation ρ Direction Autocorrelation by Price Bin FIGURE 10. Trade direction autocorrelation ρ = Corr(Dt , Dt−1) by price bin, with m… view at source ↗
read the original abstract

We introduce the Polymarket-v1 Database: the complete on-chain trade archive of Polymarket's first-generation CTF Exchange on Polygon, spanning 2022-11-21 to 2026-04-28 and covering the full contract lifecycle from first settlement to natural termination. The dataset comprises 1.20 billion trade records across 1.30 million markets with $61 billion in nominal volume. Its defining feature is 100% ground-truth aggressor direction derived from the blockchain settlement layer, a property unavailable in existing prediction market archives, which rely on heuristic inference. We use this truth-aligned archive to benchmark standard microstructure tools and document three findings. First, the tick rule and bulk volume classification achieve near-random aggregate accuracy (49.83% and 50.51%), but this masks a systematic, correctable price-level gradient driven by positive trade direction autocorrelation and concentrated market-making -- two structural features of prediction markets that violate the mean-reversion assumption embedded in classical classifiers. Second, these classification errors propagate into downstream metrics: inferred VPIN diverges substantially from ground-truth VPIN, and OFI estimates are directionally biased, with material consequences for Transaction Cost Analysis. Third, ground-truth microstructure quality predicts forecasting performance in ways that classification-based proxies cannot recover: True VPIN positively predicts Brier scores, while Gibbs spread negatively predicts them -- a selection effect reflecting that high-spread niche markets attract informed specialists rather than noise traders. Replacing ground-truth metrics with classified proxies attenuates both relationships, illustrating that measurement accuracy at the transaction level is a prerequisite for reliable inference about prediction market design and probability calibration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Polymarket-v1 Database: 1.20 billion on-chain trade records from Polymarket's CTF Exchange on Polygon (2022-11-21 to 2026-04-28) across 1.30 million markets and $61 billion nominal volume. Its core contribution is the provision of 100% ground-truth aggressor direction extracted from the blockchain settlement layer. Using this archive the authors benchmark the tick rule (49.83% accuracy) and bulk volume classification (50.51% accuracy), attribute the near-random performance to positive trade-direction autocorrelation and concentrated market-making, document propagation of classification errors into VPIN and OFI, and report that ground-truth VPIN positively and Gibbs spread negatively predict Brier scores while classification-based proxies attenuate both relationships.

Significance. If the ground-truth aggressor flags are verifiably error-free, the database supplies a large-scale, externally validated resource for prediction-market microstructure that is unavailable in existing archives. The documented divergences between inferred and true VPIN/OFI, together with the differential predictive power for Brier scores, would constitute concrete evidence that transaction-level direction accuracy is a prerequisite for reliable inference on forecasting performance and market design.

major comments (2)
  1. [Abstract] Abstract: the claim that the dataset supplies '100% ground-truth aggressor direction derived from the blockchain settlement layer' is load-bearing for every accuracy number, divergence result, and Brier-score relationship, yet the manuscript supplies no description of the extraction algorithm, handling of atomic multi-leg settlements, partial fills, or contract-event decoding ambiguities.
  2. [Abstract] Abstract: the statements that 'True VPIN positively predicts Brier scores, while Gibbs spread negatively predicts them' and that 'Replacing ground-truth metrics with classified proxies attenuates both relationships' are presented without reference to the underlying statistical specifications, sample construction, or robustness checks, preventing assessment of whether these selection-effect interpretations are supported by the data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed reading and constructive comments on the abstract. Both points identify areas where additional clarity would strengthen the manuscript. We address each below and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the dataset supplies '100% ground-truth aggressor direction derived from the blockchain settlement layer' is load-bearing for every accuracy number, divergence result, and Brier-score relationship, yet the manuscript supplies no description of the extraction algorithm, handling of atomic multi-leg settlements, partial fills, or contract-event decoding ambiguities.

    Authors: We agree that the abstract does not describe the extraction procedure. The full manuscript contains a methods section that specifies the on-chain event decoding logic, the treatment of atomic multi-leg settlements as single transactions, the identification of partial fills via cumulative fill events, and the resolution of contract-event ambiguities through the CTF settlement contract ABI. To make this transparent at the point of first reading, we will add a single sentence to the abstract summarizing the extraction approach and will include an explicit cross-reference to the methods section. revision: yes

  2. Referee: [Abstract] Abstract: the statements that 'True VPIN positively predicts Brier scores, while Gibbs spread negatively predicts them' and that 'Replacing ground-truth metrics with classified proxies attenuates both relationships' are presented without reference to the underlying statistical specifications, sample construction, or robustness checks, preventing assessment of whether these selection-effect interpretations are supported by the data.

    Authors: The abstract condenses results that are fully specified in the empirical section: market-day panel regressions of Brier score on VPIN and Gibbs spread with market-type fixed effects, volume controls, and robustness to alternative sample windows and winsorization. The attenuation result is shown via side-by-side coefficient comparisons. We will revise the abstract to include brief parenthetical references to the regression specification and sample definition, thereby directing readers to the supporting details without lengthening the abstract excessively. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical database and benchmarks are self-contained

full rationale

The paper introduces an on-chain trade archive and uses its claimed ground-truth aggressor flags to benchmark tick-rule and bulk-volume classifiers, then reports divergences in VPIN/OFI and correlations between true microstructure metrics and Brier scores. No derivation chain reduces a claimed prediction or result to a fitted parameter or self-citation by construction; the central findings are direct empirical comparisons against an external data source rather than self-referential equations or renamed inputs. The work is a standard empirical contribution whose validity hinges on the accuracy of the blockchain extraction, not on internal definitional loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's claims rest on the assumption that the extracted on-chain records are complete and correctly label aggressor direction for the entire period.

axioms (1)
  • domain assumption Blockchain settlement layer supplies 100% accurate aggressor direction for every trade
    This property is stated as the defining feature that distinguishes the archive from heuristic-based datasets.

pith-pipeline@v0.9.1-grok · 5815 in / 1074 out tokens · 20765 ms · 2026-06-28T07:44:14.484163+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 3 linked inside Pith

  1. [1]

    Dubach, P. D. , title =. 2026 , howpublished =

  2. [2]

    and Tsang, K

    Yang, Z. and Tsang, K. P. , title =. 2026 , howpublished =

  3. [3]

    Akey, P. and Gr. Who Wins and Who Loses in Prediction Markets? Evidence from Polymarket , year =

  4. [4]

    , title =

    Slivkoff, N. , title =. 2025 , month =

  5. [5]

    , title =

    Becker, J. , title =. 2026 , howpublished =

  6. [6]

    and Walther, M

    Reichenbach, F. and Walther, M. , title =. 2026 , howpublished =

  7. [7]

    and Al-Chami, J

    Rahman, N. and Al-Chami, J. and Clark, J. , title =. 2025 , howpublished =

  8. [8]

    and Zhou, L

    Jia, H. and Zhou, L. and Zhang, W. and Cong, L. W. and Li, S. and Sun, S. , title =. 2026 , howpublished =

  9. [9]

    and Ma, H

    Sirolly, A. and Ma, H. and Kanoria, Y. and Sethi, R. , title =. 2025 , howpublished =

  10. [10]

    and Zitzewitz, E

    Wolfers, J. and Zitzewitz, E. , title =. Journal of Economic Perspectives , year =

  11. [11]

    and Wolfers, J

    Snowberg, E. and Wolfers, J. , title =. Journal of Political Economy , year =

  12. [12]

    and Forsythe, R

    Berg, J. and Forsythe, R. and Nelson, F. and Rietz, T. , title =. Handbook of Experimental Economics Results , editor =. 2008 , volume =

  13. [13]

    , title =

    Roll, R. , title =. The Journal of Finance , year =

  14. [14]

    Glosten, L. R. and Milgrom, P. R. , title =. Journal of Financial Economics , year =

  15. [15]

    Kyle, A. S. , title =. Econometrica , year =

  16. [16]

    , title =

    Amihud, Y. , title =. Journal of Financial Markets , year =

  17. [17]

    Lee, C. M. C. and Ready, M. J. , title =. The Journal of Finance , year =

  18. [18]

    and O'Hara, M

    Easley, D. and O'Hara, M. , title =. Journal of Financial Economics , year =

  19. [19]

    and Kiefer, N

    Easley, D. and Kiefer, N. M. and O'Hara, M. and Paperman, J. B. , title =. The Journal of Finance , year =

  20. [20]

    Easley, D. and L. Flow Toxicity and Liquidity in a High Frequency World , journal =. 2012 , volume =

  21. [21]

    , title =

    Hasbrouck, J. , title =. The Journal of Finance , year =

  22. [22]

    Corwin, S. A. and Schultz, P. , title =. The Journal of Finance , year =

  23. [23]

    and Ranaldo, A

    Abdi, F. and Ranaldo, A. , title =. The Review of Financial Studies , year =

  24. [24]

    and Granger, C

    Gonzalo, J. and Granger, C. W. J. , title =. Journal of Business and Economic Statistics , year =

  25. [25]

    Lo, A. W. and MacKinlay, A. C. , title =. The Review of Financial Studies , year =

  26. [26]

    and Perron, P

    Bai, J. and Perron, P. , title =. Econometrica , year =

  27. [27]

    and Perron, P

    Bai, J. and Perron, P. , title =. Journal of Applied Econometrics , year =

  28. [28]

    , title =

    Goodman-Bacon, A. , title =. Journal of Econometrics , year =

  29. [29]

    and Sant'Anna, P

    Callaway, B. and Sant'Anna, P. H. C. , title =. Journal of Econometrics , year =

  30. [30]

    and D'Haultf

    de Chaisemartin, C. and D'Haultf. Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects , journal =. 2020 , volume =

  31. [31]

    Fama, E. F. , title =. The Journal of Finance , year =

  32. [32]

    , title =

    O'Hara, M. , title =. 1995 , publisher =

  33. [33]

    Grossman, S. J. and Stiglitz, J. E. , title =. American Economic Review , year =

  34. [34]

    Barclay, M. J. and Warner, J. B. , title =. Journal of Financial Economics , year =

  35. [35]

    Cong, L. W. and He, Z. and Li, J. and Tang, K. , title =. Management Science , year =

  36. [36]

    and Cramton, P

    Budish, E. and Cramton, P. and Shim, J. , title =. The Quarterly Journal of Economics , year =

  37. [37]

    Abdi, F. and A. Ranaldo. 2017. A Simple Estimation of Bid-Ask Spreads from Daily Close, High, and Low Prices. The Review of Financial Studies 30 (12): 4437--4480

  38. [38]

    Gr \'e goire, N

    Akey, P., V. Gr \'e goire, N. Harvie, and C. Martineau. 2026. Who Wins and Who Loses in Prediction Markets? Evidence from Polymarket. SSRN 6443103. https://ssrn.com/abstract=6443103

  39. [39]

    Amihud, Y. 2002. Illiquidity and Stock Returns: Cross-Section and Time-Series Effects. Journal of Financial Markets 5 (1): 31--56

  40. [40]

    Bai, J. and P. Perron. 1998. Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica 66 (1): 47--78

  41. [41]

    Bai, J. and P. Perron. 2003. Computation and Analysis of Multiple Structural Change Models. Journal of Applied Econometrics 18 (1): 1--22

  42. [42]

    Barclay, M. J. and J. B. Warner. 1993. Stealth Trading and Volatility: Which Trades Move Prices? Journal of Financial Economics 34 (3): 281--305

  43. [43]

    Forsythe, F

    Berg, J., R. Forsythe, F. Nelson, and T. Rietz. 2008. Results from a Dozen Years of Election Futures Markets Research. In Handbook of Experimental Economics Results, vol. 1, edited by C. Plott and V. Smith, pp. 742--751. Elsevier

  44. [44]

    Callaway, B. and P. H. C. Sant'Anna. 2021. Difference-in-Differences with Multiple Time Periods. Journal of Econometrics 225 (2): 200--230

  45. [45]

    Corwin, S. A. and P. Schultz. 2012. A Simple Way to Estimate Bid-Ask Spreads from Daily High and Low Prices. The Journal of Finance 67 (2): 719--760

  46. [46]

    de Chaisemartin, C. and X. D'Haultf uille. 2020. Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review 110 (9): 2964--2996

  47. [47]

    Dubach, P. D. 2026. The Anatomy of a Decentralized Prediction Market: Microstructure Evidence from the Polymarket Order Book. arXiv preprint arXiv:2604.24366. https://arxiv.org/abs/2604.24366

  48. [48]

    Easley, D., N. M. Kiefer, M. O'Hara, and J. B. Paperman. 1996. Liquidity, Information, and Infrequently Traded Stocks. The Journal of Finance 51 (4): 1405--1436

  49. [49]

    Easley, D., M. M. L \'o pez de Prado, and M. O'Hara. 2012. Flow Toxicity and Liquidity in a High Frequency World. Review of Financial Studies 25 (5): 1457--1493

  50. [50]

    Easley, D. and M. O'Hara. 1987. Price, Trade Size, and Information in Securities Markets. Journal of Financial Economics 19 (1): 69--90

  51. [51]

    Fama, E. F. 1970. Efficient Capital Markets: A Review of Empirical Work. The Journal of Finance 25 (2): 383--417

  52. [52]

    Glosten, L. R. and P. R. Milgrom. 1985. Bid, Ask, and Transaction Prices in a Specialist Market with Heterogeneously Informed Traders. Journal of Financial Economics 14 (1): 71--100

  53. [53]

    Goodman-Bacon, A. 2021. Difference-in-Differences with Variation in Treatment Timing. Journal of Econometrics 225 (2): 254--277

  54. [54]

    Grossman, S. J. and J. E. Stiglitz. 1980. On the Impossibility of Informationally Efficient Markets. American Economic Review 70 (3): 393--408

  55. [55]

    Hasbrouck, J. 1991. Measuring the Information Content of Stock Trades. The Journal of Finance 46 (1): 179--207

  56. [56]

    Hasbrouck, J. 2009. Trading Costs and Returns for U.S. Equities: Estimating Effective Costs from Daily Data. The Journal of Finance 64 (3): 1445--1477

  57. [57]

    Jia, H., L. Zhou, W. Zhang, L. W. Cong, S. Li, and S. Sun. 2026. Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: Experiments & Analysis. arXiv preprint arXiv:2604.20421. https://arxiv.org/abs/2604.20421

  58. [58]

    Kyle, A. S. 1985. Continuous Auctions and Insider Trading. Econometrica 53 (6): 1315--1335

  59. [59]

    Lee, C. M. C. and M. J. Ready. 1991. Inferring Trade Direction from Intraday Data. The Journal of Finance 46 (2): 733--746

  60. [60]

    Lo, A. W. and A. C. MacKinlay. 1988. Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test. The Review of Financial Studies 1 (1): 41--66

  61. [61]

    O'Hara, M. 1995. Market Microstructure Theory. Cambridge, MA: Blackwell Publishers

  62. [62]

    Al-Chami, and J

    Rahman, N., J. Al-Chami, and J. Clark. 2025. SoK: Market Microstructure for Decentralized Prediction Markets (DePMs). arXiv preprint arXiv:2510.15612. https://arxiv.org/abs/2510.15612

  63. [63]

    Reichenbach, F. and M. Walther. 2026. Exploring Decentralized Prediction Markets: Accuracy, Skill, and Bias on Polymarket. SSRN 5910522. https://ssrn.com/abstract=5910522

  64. [64]

    Roll, R. 1984. A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market. The Journal of Finance 39 (4): 1127--1139

  65. [65]

    Sirolly, A., H. Ma, Y. Kanoria, and R. Sethi. 2025. Network-Based Detection of Wash Trading. SSRN 5714122. https://ssrn.com/abstract=5714122

  66. [66]

    Slivkoff, N. 2025. Polymarket Volume Is Being Double-Counted. Paradigm Research Note

  67. [67]

    Snowberg, E. and J. Wolfers. 2010. Explaining the Favorite--Longshot Bias: Is It Risk-Love or Misperceptions? Journal of Political Economy 118 (4): 723--746

  68. [68]

    Wolfers, J. and E. Zitzewitz. 2004. Prediction Markets. Journal of Economic Perspectives 18 (2): 107--126

  69. [69]

    Yang, Z. and K. P. Tsang. 2026. The Anatomy of a Blockchain Prediction Market: Polymarket in the 2024 U.S. Presidential Election. arXiv preprint arXiv:2603.03136. https://arxiv.org/abs/2603.03136. SSRN 6336679