pith. sign in

arxiv: 2604.18130 · v2 · submitted 2026-04-20 · 💻 cs.LG · cs.CE· stat.AP

An `Inverse' Experimental Framework to Estimate Market Efficiency

Pith reviewed 2026-05-10 05:25 UTC · model grok-4.3

classification 💻 cs.LG cs.CEstat.AP
keywords allocative efficiencydouble auctionsorderbook datamarket predictionexperimental economicsmachine learningquantile normalizationinverse framework
0
0 comments X

The pith

Machine learning models can predict allocative efficiency in double auctions from bids, asks, and realized prices alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper inverts the usual experimental economics setup to build predictive models that estimate how well a market allocates goods without knowing the true buyer and seller reservation values. Using only observable orderbook information, the approach trains linear regressions and gradient boosting trees on data from controlled auctions where the underlying supply and demand are hidden from the model. Quantile-based normalization helps handle the unstructured, non-stationary nature of bids and asks. The resulting models deliver reasonable accuracy for allocative efficiency from the first bids and asks, with further gains as prices are realized. This matters because real digital marketplaces provide only such orderbook data, so the method offers a way to gauge performance and inform governance without direct access to private valuations.

Core claim

By training on experimental double-auction data while withholding the induced reservation values, the authors demonstrate that quantile-normalized bids, asks, and price realizations suffice to predict allocative efficiency with reasonable accuracy; accuracy rises as more price data arrives, and the approach applies across market types though performance varies by target variable.

What carries the argument

Quantile-based normalization of orderbook inputs fed into predictive models (linear regression and gradient boosting trees) that map to efficiency outcomes without access to the true supply-demand curves.

If this is right

  • Allocative efficiency becomes estimable before any trades occur using only the earliest bids and asks.
  • Prediction accuracy increases once realized prices are included in the inputs.
  • The framework can be applied to different market types, though accuracy differs by setting and target metric.
  • Real-world digital marketplaces can be monitored for efficiency using only the orderbook data they already produce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same models could be tested on field data from actual trading platforms to check transfer beyond the lab.
  • Platforms might use early efficiency forecasts to adjust matching rules or fees in real time.
  • Similar inversion techniques could extend to other market formats such as posted-price or combinatorial auctions.

Load-bearing premise

Patterns learned from lab experiments with known induced values will transfer to real markets where true willingness-to-pay and willingness-to-sell are unobserved and participant behavior may differ.

What would settle it

Apply the trained models to a fresh set of double-auction experiments, compute the actual allocative efficiency from the known induced values, and check whether the predictions match within the accuracy levels reported in the paper.

read the original abstract

Digital marketplaces processing billions of dollars annually represent critical infrastructure in sociotechnical ecosystems, yet their performance optimization lacks principled measurement frameworks that can inform algorithmic governance decisions regarding market efficiency and fairness from complex market data. By looking at orderbook data from double auction markets alone, because bids and asks do not represent true maximum willingnesses to buy and true minimum willingnesses to sell, there is little an economist can say about the market's actual performance in terms of allocative efficiency. We turn to experimental data to address this issue, `inverting' the standard induced value approach of double auction experiments. Our aim is to predict key market features relevant to market efficiency, particularly allocative efficiency, using orderbook data only -- specifically bids, asks and price realizations, but not the induced reservation values -- as early as possible. Since there is no established model of strategically optimal behavior in these markets, and because orderbook data is highly unstructured, non-stationary and non-linear, we propose quantile-based normalization techniques that help us build general predictive models. We develop and train several models, including linear regressions and gradient boosting trees, leveraging quantile-based input from the underlying supply-demand model. Our models can predict allocative efficiency with reasonable accuracy from the earliest bids and asks, and these predictions improve with additional realized price data. The performance of the prediction techniques varies by target and market type. Our framework holds significant potential for application to real-world market data, offering valuable insights into market efficiency and performance, even prior to any trade realizations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes an 'inverse' experimental framework that trains machine learning models on double-auction laboratory data to predict allocative efficiency from orderbook information alone (bids, asks, and realized prices), without using the induced reservation values that serve as ground truth in standard experiments. Quantile-based normalization is applied to the inputs, and both linear regression and gradient-boosting models are trained; the authors report that these models achieve 'reasonable accuracy' from the earliest bids and asks, with accuracy improving as realized prices become available, and suggest the approach can be transferred to real-world markets where true valuations remain unobserved.

Significance. If the learned mapping generalizes, the framework would supply a practical method for estimating allocative efficiency in field markets where willingness-to-pay and willingness-to-sell are private and no ground-truth efficiency label exists. The use of controlled experiments to supervise predictors for an otherwise unobservable quantity is a creative inversion of the induced-value paradigm and could inform algorithmic market design. At present, however, the significance is limited because all reported results remain within the induced-value laboratory setting and no quantitative performance figures or generalization tests are supplied.

major comments (3)
  1. [Abstract and Results] Abstract and Results: the central performance claim is stated only as 'reasonable accuracy' with no accompanying metrics (R², MAE, RMSE), confidence intervals, or description of evaluation protocol (train/test splits, cross-validation, or number of markets). Without these numbers the strength of the predictive result cannot be assessed and the claim that predictions 'improve with additional realized price data' remains unquantified.
  2. [Methods] Methods (quantile normalization): the normalization step is described as leveraging 'quantile-based input from the underlying supply-demand model.' In any real-market deployment the true supply and demand schedules are unknown, so it is unclear how the same normalization could be performed; this directly affects the applicability claim made in the abstract.
  3. [Evaluation and Discussion] Evaluation and Discussion: no out-of-distribution experiments are reported that alter the distribution of induced values or introduce non-induced strategic behavior. Because the training data are generated under controlled induced-value conditions, the absence of such tests leaves the transferability to real markets—where valuations are private and behavior may differ—unsupported.
minor comments (1)
  1. [Abstract] The abstract lists 'quantile levels' and 'model hyperparameters' as free parameters but does not report the specific values or ranges used in the reported experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments identify important gaps in quantification, applicability, and generalizability that we address below. We have revised the manuscript to incorporate quantitative metrics, additional discussion of normalization adaptations, and new experiments on out-of-distribution performance.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: the central performance claim is stated only as 'reasonable accuracy' with no accompanying metrics (R², MAE, RMSE), confidence intervals, or description of evaluation protocol (train/test splits, cross-validation, or number of markets). Without these numbers the strength of the predictive result cannot be assessed and the claim that predictions 'improve with additional realized price data' remains unquantified.

    Authors: We agree that the performance claims require explicit quantitative support. In the revised manuscript we have expanded the abstract and results section to report R², MAE, and RMSE values for both linear regression and gradient-boosting models at successive stages of data availability (early bids/asks versus with realized prices). We also describe the evaluation protocol, including the number of markets, train/test splits, and cross-validation procedure, together with confidence intervals. These additions make the improvement in predictive accuracy with additional price data directly measurable. revision: yes

  2. Referee: [Methods] Methods (quantile normalization): the normalization step is described as leveraging 'quantile-based input from the underlying supply-demand model.' In any real-market deployment the true supply and demand schedules are unknown, so it is unclear how the same normalization could be performed; this directly affects the applicability claim made in the abstract.

    Authors: The referee correctly notes that the quantile normalization as originally described relies on the known experimental supply and demand schedules. We have added a new subsection in the methods that explains how empirical quantiles can be estimated from historical orderbook data in field settings. We have also revised the abstract and discussion to qualify the applicability claim, explicitly stating that the normalization step becomes an approximation when true valuations are unobserved and discussing the resulting limitations. revision: partial

  3. Referee: [Evaluation and Discussion] Evaluation and Discussion: no out-of-distribution experiments are reported that alter the distribution of induced values or introduce non-induced strategic behavior. Because the training data are generated under controlled induced-value conditions, the absence of such tests leaves the transferability to real markets—where valuations are private and behavior may differ—unsupported.

    Authors: We acknowledge that the original manuscript did not include out-of-distribution tests. In the revised version we have added experiments that vary the distribution of induced values and introduce controlled strategic deviations (e.g., bid shading). These results are reported in a new subsection of the evaluation, with accompanying discussion of how performance changes and what this implies for transfer to real markets where behavior may differ from the laboratory setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; predictive models are trained independently of target labels

full rationale

The paper trains supervised models (linear regression, gradient boosting) on experimental double-auction data to predict allocative efficiency. Inputs are restricted to bids, asks, and realized prices; the efficiency target is computed separately from induced reservation values that are never supplied to the model. This is a standard feature-to-label mapping rather than any self-definitional reduction, fitted-input-as-prediction, or self-citation chain. Quantile normalization is applied to the orderbook features using the known experimental supply-demand schedule, but the resulting predictions remain non-tautological and falsifiable against held-out experimental outcomes. No load-bearing step collapses the claimed accuracy into the inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on the transferability of experimental patterns to real markets and on the sufficiency of quantile statistics to capture supply-demand structure without explicit values. No new entities are postulated.

free parameters (2)
  • quantile levels
    Choice of quantiles for normalization is a modeling decision that affects input features and is not derived from first principles.
  • model hyperparameters
    Gradient boosting and regression parameters are tuned on experimental data.
axioms (2)
  • domain assumption Orderbook data contains sufficient statistical signal about underlying supply and demand to predict allocative efficiency
    Invoked when claiming that bids, asks, and prices alone suffice for prediction.
  • domain assumption Experimental double-auction behavior generalizes to real marketplaces
    Required for the stated application potential.

pith-pipeline@v0.9.0 · 5571 in / 1313 out tokens · 42240 ms · 2026-05-10T05:25:35.176154+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Arestructuralestimatesofauctionmodelsreasonable? evidence from experimental data.Journal of Political economy, 113(4):703–741, 2005

    PatrickBajariandAliHortacsu. Arestructuralestimatesofauctionmodelsreasonable? evidence from experimental data.Journal of Political economy, 113(4):703–741, 2005

  2. [2]

    Deep learning modeling of limit order book: A comparative perspective, 2020

    Antonio Briola, Jeremy Turiel, and Tomaso Aste. Deep learning modeling of limit order book: A comparative perspective, 2020

  3. [3]

    Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets

    Dave Cliff and Janet Bruten. Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets. Technical Report HPL-97-141, Hewlett-Packard Laboratories, 1997

  4. [4]

    Direct tests of the reservation wage property

    James C Cox and Ronald L Oaxaca. Direct tests of the reservation wage property. The Economic Journal, 102(415):1423–1432, 1992

  5. [5]

    Springer Netherlands, Dordrecht, 1992

    James C Cox and Ronald L Oaxaca.T ests for a Reservation W age Effect, pages 171–177. Springer Netherlands, Dordrecht, 1992

  6. [6]

    Testing job search models: The laboratory approach.Research in labor economics, 15:171–207, 1996

    James C Cox and Ronald L Oaxaca. Testing job search models: The laboratory approach.Research in labor economics, 15:171–207, 1996

  7. [7]

    Efficient capital markets: A review of theory and empirical work

    Eugene F Fama. Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2):383–417, 1970

  8. [8]

    Ensemble Machine Learning,

    Artur J Ferreira and Mário A T Figueiredo.Boosting Algorithms: A Review of Methods, Theory, and Applications, pages 35–85. Springer New York, New York, NY, 2012. ISBN 978-1-4419-9326-7. doi: 10.1007/978-1-4419-9326-7{\_}2. URL https://doi.org/10.1007/978-1-4419-9326-7{_}2

  9. [9]

    Routledge, 2018

    Daniel Friedman.The double auction market: institutions, theories, and evidence. Routledge, 2018

  10. [10]

    Greedy function approximation: a gradient boosting machine

    Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001

  11. [11]

    Predicting and understanding initial play

    Drew Fudenberg and Annie Liang. Predicting and understanding initial play. American Economic Review, 109(12):4112–4141, 2019

  12. [12]

    Bagging gradient- boosted trees for high precision, low variance ranking models

    Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. Bagging gradient- boosted trees for high precision, low variance ranking models. InProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 85–94, 2011

  13. [13]

    Allocative efficiency of markets with zero- intelligence traders: Market as a partial substitute for individual rationality.Journal of political economy, 101(1):119–137, 1993

    Dhananjay K Gode and Shyam Sunder. Allocative efficiency of markets with zero- intelligence traders: Market as a partial substitute for individual rationality.Journal of political economy, 101(1):119–137, 1993

  14. [14]

    Inference with an incomplete model of english auctions.Journal of Political Economy, 111(1):1–51, 2003

    Philip A Haile and Elie Tamer. Inference with an incomplete model of english auctions.Journal of Political Economy, 111(1):1–51, 2003

  15. [15]

    Robust estimation of a location parameter.The Annals of Mathematical Statistics, pages 73–101, 1964

    Peter J Huber. Robust estimation of a location parameter.The Annals of Mathematical Statistics, pages 73–101, 1964

  16. [16]

    Competitive market behavior: convergence and asymmetry in the experimental double auction.International Economic Review, 64(3):1087–1126,

    Barbara Ikica, Simon Jantschgi, Heinrich H Nax, Diego G Nuñez Duran, and Bary SR Pradelski. Competitive market behavior: convergence and asymmetry in the experimental double auction.International Economic Review, 64(3):1087–1126,

  17. [17]

    doi: https://doi.org/10.1111/iere.12630

  18. [18]

    Perishable goods versus re-tradable assets: A theoretical reappraisal of a fundamental dichotomy

    Sabiou M Inoua and Vernon L Smith. Perishable goods versus re-tradable assets: A theoretical reappraisal of a fundamental dichotomy. InHandbook of Experimental Finance, pages 162–171. Edward Elgar Publishing, 2022. 32

  19. [19]

    Springer, 2013

    Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al.An introduction to statistical learning, volume 112. Springer, 2013

  20. [20]

    Wilcoxon rank-based tests for clustered data with R package clusrank.Journal of Statistical Software, 96 (6):1–26, 2020

    Yujing Jiang, Mei-Ling Ting Lee, Bernard Rosner, and Jun Yan. Wilcoxon rank-based tests for clustered data with R package clusrank.Journal of Statistical Software, 96 (6):1–26, 2020. doi: 10.18637/jss.v096.i06

  21. [21]

    Bargaining and welfare: A dynamic structural analysis.Report, Y ale University.[221], 2011

    Daniel Keniston. Bargaining and welfare: A dynamic structural analysis.Report, Y ale University.[221], 2011

  22. [22]

    Modelling high-frequency limit order book dynamics with support vector machines.Quantitative Finance, 15(8):1315–1329, 2015

    Alec N Kercheval and Yuan Zhang. Modelling high-frequency limit order book dynamics with support vector machines.Quantitative Finance, 15(8):1315–1329, 2015

  23. [23]

    Amechanismdesignapproachtoidentification and estimation

    BradleyLarsenandAnthonyLeeZhang. Amechanismdesignapproachtoidentification and estimation. Technical report, National Bureau of Economic Research, 2018

  24. [24]

    The efficiency of real-world bargaining: Evidence from wholesale used-auto auctions.The Review of Economic Studies, 88(2):851–882, 2021

    Bradley J Larsen. The efficiency of real-world bargaining: Evidence from wholesale used-auto auctions.The Review of Economic Studies, 88(2):851–882, 2021

  25. [25]

    Nonparametric identification of k-double auctions using price data

    Huihui Li. Nonparametric identification of k-double auctions using price data. W orking Paper, 2015

  26. [26]

    Evidence ofgeneral economic principles of bargainingand tradefrom2,000classroomexperiments.Nature Human Behaviour,4(9):917–927,2020

    Po-Hsuan Lin, Alexander L Brown, Taisuke Imai, Joseph Tao-yi Wang, Stephanie W Wang, andColinF Camerer. Evidence ofgeneral economic principles of bargainingand tradefrom2,000classroomexperiments.Nature Human Behaviour,4(9):917–927,2020

  27. [27]

    Machine learning for forecasting mid-price movements using limit order book data.IEEE Access, 7:64722–64736, 2019

    Paraskevi Nousi, Avraam Tsantekidis, Nikolaos Passalis, Adamantios Ntakaris, Juho Kanniainen, Anastasios Tefas, Moncef Gabbouj, and Alexandros Iosifidis. Machine learning for forecasting mid-price movements using limit order book data.IEEE Access, 7:64722–64736, 2019

  28. [28]

    Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods.Journal of F orecasting, 37(8):852–866, 2018

    Adamantios Ntakaris, Martin Magris, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods.Journal of F orecasting, 37(8):852–866, 2018

  29. [29]

    Use of the Wilcoxon signed-rank test for clustered data.Biometrics, 55(4):1258–1264, 1999

    Bernard Rosner, Robert J Glynn, and Mei-Ling Ting Lee. Use of the Wilcoxon signed-rank test for clustered data.Biometrics, 55(4):1258–1264, 1999

  30. [30]

    Convergence to efficiency in a simple market with incomplete information.Econometrica, 62:1041–1063, 1994

    Aldo Rustichini, Mark Satterthwaite, and Steven Williams. Convergence to efficiency in a simple market with incomplete information.Econometrica, 62:1041–1063, 1994

  31. [31]

    The use of bundling in b2b online reverse auctions.Journal of Operations Management, 26(1):81–95, 2008

    Tobias Schoenherr and Vincent A Mabert. The use of bundling in b2b online reverse auctions.Journal of Operations Management, 26(1):81–95, 2008

  32. [32]

    An experimental study of competitive market behavior.Journal of Political Economy, 70(2):111–137, 1962

    Vernon L Smith. An experimental study of competitive market behavior.Journal of Political Economy, 70(2):111–137, 1962

  33. [33]

    Experimental economics: Induced value theory.The American Economic Review, 66(2):274–279, 1976

    Vernon L Smith. Experimental economics: Induced value theory.The American Economic Review, 66(2):274–279, 1976

  34. [34]

    CRC press, 2018

    Steven H Strogatz.Nonlinear dynamics and chaos with student solutions manual: With applications to physics, biology, chemistry, and engineering. CRC press, 2018

  35. [35]

    Incentive efficiency of double auctions.Econometrica, 53(5): 1101–1115, 1985

    Robert Wilson. Incentive efficiency of double auctions.Econometrica, 53(5): 1101–1115, 1985

  36. [36]

    How robust are limit order book representations under data perturbation?arXiv preprint arXiv:2110.04752, 2021

    Yufei Wu, Mahmoud Mahfouz, Daniele Magazzeni, and Manuela Veloso. How robust are limit order book representations under data perturbation?arXiv preprint arXiv:2110.04752, 2021

  37. [37]

    The sealed-bid abstraction in online auctions.Marketing Science, 29(6):964–987, 2010

    Robert Zeithammer and Christopher Adams. The sealed-bid abstraction in online auctions.Marketing Science, 29(6):964–987, 2010. 33

  38. [38]

    Model A" minus the error

    Zihao Zhang, Stefan Zohren, and Stephen Roberts. Deeplob: Deep convolutional neural networks for limit order books.IEEE T ransactions on Signal Processing, 67 (11):3001–3012, 2019. Appendix The Python code and the pre-processed data are publicly available at https://github.com/asikist/inverse_experimental_markets. A Wilcoxon T ests for Model Comparison T ...