An `Inverse' Experimental Framework to Estimate Market Efficiency
Pith reviewed 2026-05-10 05:25 UTC · model grok-4.3
The pith
Machine learning models can predict allocative efficiency in double auctions from bids, asks, and realized prices alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training on experimental double-auction data while withholding the induced reservation values, the authors demonstrate that quantile-normalized bids, asks, and price realizations suffice to predict allocative efficiency with reasonable accuracy; accuracy rises as more price data arrives, and the approach applies across market types though performance varies by target variable.
What carries the argument
Quantile-based normalization of orderbook inputs fed into predictive models (linear regression and gradient boosting trees) that map to efficiency outcomes without access to the true supply-demand curves.
If this is right
- Allocative efficiency becomes estimable before any trades occur using only the earliest bids and asks.
- Prediction accuracy increases once realized prices are included in the inputs.
- The framework can be applied to different market types, though accuracy differs by setting and target metric.
- Real-world digital marketplaces can be monitored for efficiency using only the orderbook data they already produce.
Where Pith is reading between the lines
- The same models could be tested on field data from actual trading platforms to check transfer beyond the lab.
- Platforms might use early efficiency forecasts to adjust matching rules or fees in real time.
- Similar inversion techniques could extend to other market formats such as posted-price or combinatorial auctions.
Load-bearing premise
Patterns learned from lab experiments with known induced values will transfer to real markets where true willingness-to-pay and willingness-to-sell are unobserved and participant behavior may differ.
What would settle it
Apply the trained models to a fresh set of double-auction experiments, compute the actual allocative efficiency from the known induced values, and check whether the predictions match within the accuracy levels reported in the paper.
read the original abstract
Digital marketplaces processing billions of dollars annually represent critical infrastructure in sociotechnical ecosystems, yet their performance optimization lacks principled measurement frameworks that can inform algorithmic governance decisions regarding market efficiency and fairness from complex market data. By looking at orderbook data from double auction markets alone, because bids and asks do not represent true maximum willingnesses to buy and true minimum willingnesses to sell, there is little an economist can say about the market's actual performance in terms of allocative efficiency. We turn to experimental data to address this issue, `inverting' the standard induced value approach of double auction experiments. Our aim is to predict key market features relevant to market efficiency, particularly allocative efficiency, using orderbook data only -- specifically bids, asks and price realizations, but not the induced reservation values -- as early as possible. Since there is no established model of strategically optimal behavior in these markets, and because orderbook data is highly unstructured, non-stationary and non-linear, we propose quantile-based normalization techniques that help us build general predictive models. We develop and train several models, including linear regressions and gradient boosting trees, leveraging quantile-based input from the underlying supply-demand model. Our models can predict allocative efficiency with reasonable accuracy from the earliest bids and asks, and these predictions improve with additional realized price data. The performance of the prediction techniques varies by target and market type. Our framework holds significant potential for application to real-world market data, offering valuable insights into market efficiency and performance, even prior to any trade realizations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an 'inverse' experimental framework that trains machine learning models on double-auction laboratory data to predict allocative efficiency from orderbook information alone (bids, asks, and realized prices), without using the induced reservation values that serve as ground truth in standard experiments. Quantile-based normalization is applied to the inputs, and both linear regression and gradient-boosting models are trained; the authors report that these models achieve 'reasonable accuracy' from the earliest bids and asks, with accuracy improving as realized prices become available, and suggest the approach can be transferred to real-world markets where true valuations remain unobserved.
Significance. If the learned mapping generalizes, the framework would supply a practical method for estimating allocative efficiency in field markets where willingness-to-pay and willingness-to-sell are private and no ground-truth efficiency label exists. The use of controlled experiments to supervise predictors for an otherwise unobservable quantity is a creative inversion of the induced-value paradigm and could inform algorithmic market design. At present, however, the significance is limited because all reported results remain within the induced-value laboratory setting and no quantitative performance figures or generalization tests are supplied.
major comments (3)
- [Abstract and Results] Abstract and Results: the central performance claim is stated only as 'reasonable accuracy' with no accompanying metrics (R², MAE, RMSE), confidence intervals, or description of evaluation protocol (train/test splits, cross-validation, or number of markets). Without these numbers the strength of the predictive result cannot be assessed and the claim that predictions 'improve with additional realized price data' remains unquantified.
- [Methods] Methods (quantile normalization): the normalization step is described as leveraging 'quantile-based input from the underlying supply-demand model.' In any real-market deployment the true supply and demand schedules are unknown, so it is unclear how the same normalization could be performed; this directly affects the applicability claim made in the abstract.
- [Evaluation and Discussion] Evaluation and Discussion: no out-of-distribution experiments are reported that alter the distribution of induced values or introduce non-induced strategic behavior. Because the training data are generated under controlled induced-value conditions, the absence of such tests leaves the transferability to real markets—where valuations are private and behavior may differ—unsupported.
minor comments (1)
- [Abstract] The abstract lists 'quantile levels' and 'model hyperparameters' as free parameters but does not report the specific values or ranges used in the reported experiments.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed report. The comments identify important gaps in quantification, applicability, and generalizability that we address below. We have revised the manuscript to incorporate quantitative metrics, additional discussion of normalization adaptations, and new experiments on out-of-distribution performance.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: the central performance claim is stated only as 'reasonable accuracy' with no accompanying metrics (R², MAE, RMSE), confidence intervals, or description of evaluation protocol (train/test splits, cross-validation, or number of markets). Without these numbers the strength of the predictive result cannot be assessed and the claim that predictions 'improve with additional realized price data' remains unquantified.
Authors: We agree that the performance claims require explicit quantitative support. In the revised manuscript we have expanded the abstract and results section to report R², MAE, and RMSE values for both linear regression and gradient-boosting models at successive stages of data availability (early bids/asks versus with realized prices). We also describe the evaluation protocol, including the number of markets, train/test splits, and cross-validation procedure, together with confidence intervals. These additions make the improvement in predictive accuracy with additional price data directly measurable. revision: yes
-
Referee: [Methods] Methods (quantile normalization): the normalization step is described as leveraging 'quantile-based input from the underlying supply-demand model.' In any real-market deployment the true supply and demand schedules are unknown, so it is unclear how the same normalization could be performed; this directly affects the applicability claim made in the abstract.
Authors: The referee correctly notes that the quantile normalization as originally described relies on the known experimental supply and demand schedules. We have added a new subsection in the methods that explains how empirical quantiles can be estimated from historical orderbook data in field settings. We have also revised the abstract and discussion to qualify the applicability claim, explicitly stating that the normalization step becomes an approximation when true valuations are unobserved and discussing the resulting limitations. revision: partial
-
Referee: [Evaluation and Discussion] Evaluation and Discussion: no out-of-distribution experiments are reported that alter the distribution of induced values or introduce non-induced strategic behavior. Because the training data are generated under controlled induced-value conditions, the absence of such tests leaves the transferability to real markets—where valuations are private and behavior may differ—unsupported.
Authors: We acknowledge that the original manuscript did not include out-of-distribution tests. In the revised version we have added experiments that vary the distribution of induced values and introduce controlled strategic deviations (e.g., bid shading). These results are reported in a new subsection of the evaluation, with accompanying discussion of how performance changes and what this implies for transfer to real markets where behavior may differ from the laboratory setting. revision: yes
Circularity Check
No significant circularity; predictive models are trained independently of target labels
full rationale
The paper trains supervised models (linear regression, gradient boosting) on experimental double-auction data to predict allocative efficiency. Inputs are restricted to bids, asks, and realized prices; the efficiency target is computed separately from induced reservation values that are never supplied to the model. This is a standard feature-to-label mapping rather than any self-definitional reduction, fitted-input-as-prediction, or self-citation chain. Quantile normalization is applied to the orderbook features using the known experimental supply-demand schedule, but the resulting predictions remain non-tautological and falsifiable against held-out experimental outcomes. No load-bearing step collapses the claimed accuracy into the inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (2)
- quantile levels
- model hyperparameters
axioms (2)
- domain assumption Orderbook data contains sufficient statistical signal about underlying supply and demand to predict allocative efficiency
- domain assumption Experimental double-auction behavior generalizes to real marketplaces
Reference graph
Works this paper leans on
-
[1]
PatrickBajariandAliHortacsu. Arestructuralestimatesofauctionmodelsreasonable? evidence from experimental data.Journal of Political economy, 113(4):703–741, 2005
work page 2005
-
[2]
Deep learning modeling of limit order book: A comparative perspective, 2020
Antonio Briola, Jeremy Turiel, and Tomaso Aste. Deep learning modeling of limit order book: A comparative perspective, 2020
work page 2020
-
[3]
Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets
Dave Cliff and Janet Bruten. Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets. Technical Report HPL-97-141, Hewlett-Packard Laboratories, 1997
work page 1997
-
[4]
Direct tests of the reservation wage property
James C Cox and Ronald L Oaxaca. Direct tests of the reservation wage property. The Economic Journal, 102(415):1423–1432, 1992
work page 1992
-
[5]
Springer Netherlands, Dordrecht, 1992
James C Cox and Ronald L Oaxaca.T ests for a Reservation W age Effect, pages 171–177. Springer Netherlands, Dordrecht, 1992
work page 1992
-
[6]
Testing job search models: The laboratory approach.Research in labor economics, 15:171–207, 1996
James C Cox and Ronald L Oaxaca. Testing job search models: The laboratory approach.Research in labor economics, 15:171–207, 1996
work page 1996
-
[7]
Efficient capital markets: A review of theory and empirical work
Eugene F Fama. Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2):383–417, 1970
work page 1970
-
[8]
Artur J Ferreira and Mário A T Figueiredo.Boosting Algorithms: A Review of Methods, Theory, and Applications, pages 35–85. Springer New York, New York, NY, 2012. ISBN 978-1-4419-9326-7. doi: 10.1007/978-1-4419-9326-7{\_}2. URL https://doi.org/10.1007/978-1-4419-9326-7{_}2
-
[9]
Daniel Friedman.The double auction market: institutions, theories, and evidence. Routledge, 2018
work page 2018
-
[10]
Greedy function approximation: a gradient boosting machine
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001
work page 2001
-
[11]
Predicting and understanding initial play
Drew Fudenberg and Annie Liang. Predicting and understanding initial play. American Economic Review, 109(12):4112–4141, 2019
work page 2019
-
[12]
Bagging gradient- boosted trees for high precision, low variance ranking models
Yasser Ganjisaffar, Rich Caruana, and Cristina Videira Lopes. Bagging gradient- boosted trees for high precision, low variance ranking models. InProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 85–94, 2011
work page 2011
-
[13]
Dhananjay K Gode and Shyam Sunder. Allocative efficiency of markets with zero- intelligence traders: Market as a partial substitute for individual rationality.Journal of political economy, 101(1):119–137, 1993
work page 1993
-
[14]
Philip A Haile and Elie Tamer. Inference with an incomplete model of english auctions.Journal of Political Economy, 111(1):1–51, 2003
work page 2003
-
[15]
Robust estimation of a location parameter.The Annals of Mathematical Statistics, pages 73–101, 1964
Peter J Huber. Robust estimation of a location parameter.The Annals of Mathematical Statistics, pages 73–101, 1964
work page 1964
-
[16]
Barbara Ikica, Simon Jantschgi, Heinrich H Nax, Diego G Nuñez Duran, and Bary SR Pradelski. Competitive market behavior: convergence and asymmetry in the experimental double auction.International Economic Review, 64(3):1087–1126,
-
[17]
doi: https://doi.org/10.1111/iere.12630
-
[18]
Perishable goods versus re-tradable assets: A theoretical reappraisal of a fundamental dichotomy
Sabiou M Inoua and Vernon L Smith. Perishable goods versus re-tradable assets: A theoretical reappraisal of a fundamental dichotomy. InHandbook of Experimental Finance, pages 162–171. Edward Elgar Publishing, 2022. 32
work page 2022
-
[19]
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al.An introduction to statistical learning, volume 112. Springer, 2013
work page 2013
-
[20]
Yujing Jiang, Mei-Ling Ting Lee, Bernard Rosner, and Jun Yan. Wilcoxon rank-based tests for clustered data with R package clusrank.Journal of Statistical Software, 96 (6):1–26, 2020. doi: 10.18637/jss.v096.i06
-
[21]
Bargaining and welfare: A dynamic structural analysis.Report, Y ale University.[221], 2011
Daniel Keniston. Bargaining and welfare: A dynamic structural analysis.Report, Y ale University.[221], 2011
work page 2011
-
[22]
Alec N Kercheval and Yuan Zhang. Modelling high-frequency limit order book dynamics with support vector machines.Quantitative Finance, 15(8):1315–1329, 2015
work page 2015
-
[23]
Amechanismdesignapproachtoidentification and estimation
BradleyLarsenandAnthonyLeeZhang. Amechanismdesignapproachtoidentification and estimation. Technical report, National Bureau of Economic Research, 2018
work page 2018
-
[24]
Bradley J Larsen. The efficiency of real-world bargaining: Evidence from wholesale used-auto auctions.The Review of Economic Studies, 88(2):851–882, 2021
work page 2021
-
[25]
Nonparametric identification of k-double auctions using price data
Huihui Li. Nonparametric identification of k-double auctions using price data. W orking Paper, 2015
work page 2015
-
[26]
Po-Hsuan Lin, Alexander L Brown, Taisuke Imai, Joseph Tao-yi Wang, Stephanie W Wang, andColinF Camerer. Evidence ofgeneral economic principles of bargainingand tradefrom2,000classroomexperiments.Nature Human Behaviour,4(9):917–927,2020
work page 2020
-
[27]
Paraskevi Nousi, Avraam Tsantekidis, Nikolaos Passalis, Adamantios Ntakaris, Juho Kanniainen, Anastasios Tefas, Moncef Gabbouj, and Alexandros Iosifidis. Machine learning for forecasting mid-price movements using limit order book data.IEEE Access, 7:64722–64736, 2019
work page 2019
-
[28]
Adamantios Ntakaris, Martin Magris, Juho Kanniainen, Moncef Gabbouj, and Alexandros Iosifidis. Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods.Journal of F orecasting, 37(8):852–866, 2018
work page 2018
-
[29]
Use of the Wilcoxon signed-rank test for clustered data.Biometrics, 55(4):1258–1264, 1999
Bernard Rosner, Robert J Glynn, and Mei-Ling Ting Lee. Use of the Wilcoxon signed-rank test for clustered data.Biometrics, 55(4):1258–1264, 1999
work page 1999
-
[30]
Aldo Rustichini, Mark Satterthwaite, and Steven Williams. Convergence to efficiency in a simple market with incomplete information.Econometrica, 62:1041–1063, 1994
work page 1994
-
[31]
Tobias Schoenherr and Vincent A Mabert. The use of bundling in b2b online reverse auctions.Journal of Operations Management, 26(1):81–95, 2008
work page 2008
-
[32]
Vernon L Smith. An experimental study of competitive market behavior.Journal of Political Economy, 70(2):111–137, 1962
work page 1962
-
[33]
Experimental economics: Induced value theory.The American Economic Review, 66(2):274–279, 1976
Vernon L Smith. Experimental economics: Induced value theory.The American Economic Review, 66(2):274–279, 1976
work page 1976
-
[34]
Steven H Strogatz.Nonlinear dynamics and chaos with student solutions manual: With applications to physics, biology, chemistry, and engineering. CRC press, 2018
work page 2018
-
[35]
Incentive efficiency of double auctions.Econometrica, 53(5): 1101–1115, 1985
Robert Wilson. Incentive efficiency of double auctions.Econometrica, 53(5): 1101–1115, 1985
work page 1985
-
[36]
Yufei Wu, Mahmoud Mahfouz, Daniele Magazzeni, and Manuela Veloso. How robust are limit order book representations under data perturbation?arXiv preprint arXiv:2110.04752, 2021
-
[37]
The sealed-bid abstraction in online auctions.Marketing Science, 29(6):964–987, 2010
Robert Zeithammer and Christopher Adams. The sealed-bid abstraction in online auctions.Marketing Science, 29(6):964–987, 2010. 33
work page 2010
-
[38]
Zihao Zhang, Stefan Zohren, and Stephen Roberts. Deeplob: Deep convolutional neural networks for limit order books.IEEE T ransactions on Signal Processing, 67 (11):3001–3012, 2019. Appendix The Python code and the pre-processed data are publicly available at https://github.com/asikist/inverse_experimental_markets. A Wilcoxon T ests for Model Comparison T ...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.