Revenue-Optimal Pricing for Budget-Constrained Buyers in Data Markets
Pith reviewed 2026-05-15 21:59 UTC · model grok-4.3
The pith
Revenue maximization for pricing datasets to budget-constrained buyers is APX-hard, yet constant-factor approximations exist for online and offline cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a data market offering multiple datasets, each buyer has a budget and a prediction task whose accuracy improves with acquired data fractions; buyers select the utility-maximizing fractional bundle under given prices. Revenue maximization over price vectors is APX-hard. A 2-approximation algorithm computes prices for the online arrival of datasets, while a (1-1/e)^{-1}-approximation algorithm computes prices when the full collection of datasets is known upfront.
What carries the argument
The fractional purchase model in which each buyer can acquire any fraction of each dataset and pays proportionally, optimizing a quasi-linear utility subject to a hard budget constraint.
If this is right
- A seller facing sequential dataset arrivals can guarantee at least half the revenue of an optimal price vector.
- When all datasets are known, the seller can guarantee revenue at least 1-1/e of the optimum.
- Exact optimal prices cannot be computed in polynomial time unless P=NP.
- The fractional model allows constant-factor revenue guarantees that are unavailable under indivisible purchases.
Where Pith is reading between the lines
- Market platforms could implement the online algorithm to set prices dynamically as new datasets become available without sacrificing too much revenue.
- The hardness result suggests that future work should identify natural restrictions on buyer valuations that restore polynomial-time solvability.
- Fractional sales of data may be more practical than whole-dataset sales because they enable the reported approximation ratios.
Load-bearing premise
Buyers always select the bundle of data fractions that maximizes their quasi-linear utility given the posted prices and their budgets.
What would settle it
A polynomial-time algorithm that returns prices yielding revenue strictly larger than (1-1/e)^{-1} times the optimum on every instance, or a family of instances showing that no polynomial-time algorithm can guarantee better than the stated ratios.
Figures
read the original abstract
We study revenue-optimal pricing in data markets with rational, budget-constrained buyers. Such a market offers multiple datasets for sale, and buyers aim to improve the accuracy of their prediction tasks by acquiring data bundles. The market's objective is to price datasets to maximize total revenue, considering that buyers with quasi-linear utilities choose their bundles optimally under budget constraints. We allow the buyers to purchase fractions of datasets, and the amount they pay is proportional to the fraction they receive. Although competitive equilibrium gives revenue-optimal pricing in rivalrous markets with quasi-linear buyers, we show that revenue maximization in data markets is APX-hard. Despite the hardness, we design a 2-approximation algorithm when datasets arrive online, and a $(1-1/e)^{-1}$-approximation algorithm for the offline setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies revenue-optimal pricing in data markets with multiple datasets offered to budget-constrained buyers who have quasi-linear utilities and can purchase arbitrary fractions of datasets with proportional payments. It shows that revenue maximization is APX-hard, unlike in rivalrous markets where competitive equilibrium is optimal. The paper provides a 2-approximation algorithm for the online setting and a (1-1/e)^{-1}-approximation for the offline setting.
Significance. If the hardness result and approximation algorithms hold, the paper makes a clear contribution to algorithmic game theory by identifying why data markets differ computationally from standard rivalrous-good settings and by supplying explicit, usable approximation guarantees. The online 2-approximation and offline (1-1/e)^{-1} bound are concrete strengths that could inform practical data-market design.
minor comments (1)
- [Abstract] Abstract: the phrase 'although competitive equilibrium gives revenue-optimal pricing in rivalrous markets' would benefit from a one-sentence reminder of the precise conditions (quasi-linear utilities, no budget constraints) under which that optimality holds, to sharpen the contrast with the data-market model.
Simulated Author's Rebuttal
We thank the referee for the accurate summary of our results on revenue-optimal pricing for budget-constrained buyers in data markets and for the positive recommendation of minor revision. The referee correctly identifies the APX-hardness result and the 2-approximation (online) and (1-1/e)^{-1}-approximation (offline) algorithms as the core contributions.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core results—APX-hardness of revenue maximization and the stated approximation algorithms (2-approx online, (1-1/e)^{-1} offline)—are derived algorithmically from the explicit fractional quasi-linear utility model with budget constraints. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain. The reference to competitive equilibrium optimality in rivalrous markets is a standard external fact, not load-bearing self-citation. The fractional purchase assumption is stated upfront as part of the model rather than smuggled in. The derivation is self-contained against the model inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Buyers have quasi-linear utilities and select optimal bundles under budget constraints
- domain assumption Fractional purchases of datasets are allowed with payments proportional to the fraction received
Reference graph
Works this paper leans on
-
[1]
On the theoretical foundations of data exchange economies
[ACGM25] Hannaneh Akrami, Bhaskar Ray Chaudhury, Jugal Garg, and Aniket Murhekar. On the theoretical foundations of data exchange economies. InACM Conf. Economics and Computation (EC), pages 444–444, 2025.doi:10.1145/3736252.3742566. [Acu22] Acumen Research. Big data market size: Global industry, share, analysis, trends and forecast 2022 - 2030,
-
[2]
URL:https://www.acumenresearchandconsulting.com/ big-data-market. [AD54] Kenneth J Arrow and Gerard Debreu. Existence of an equilibrium for a competitive economy.Econometrica, 22(3):265–290, 1954.doi:10.2307/1907353. [ADHR24] Anish Agarwal, Munther A. Dahleh, Thibaut Horel, and Maryann Rui. Towards data auctions with externalities.Games Econ. Behav., 148:...
-
[3]
[AGR11] Yossi Azar, Iftah Gamzu, and Ran Roth. Submodular max-SAT. InEuropean Symp. Algorithms (ESA), pages 323–334. Springer, 2011.doi:10.1007/978-3-642-23719-5 _28. [AP86] Anat R Admati and Paul Pfleiderer. A monopolistic market for information.Journal of Economic Theory, 39(2):400–438, 1986.doi:10.1016/0022-0531(86)90052-9. [AP90] Anat R Admati and Pau...
-
[4]
Partial function extension with applications to learning and property testing
[BK20] Umang Bhaskar and Gunjan Kumar. Partial function extension with applications to learning and property testing. InIntl. Symp. Algorithms and Computation (ISAAC), volume 181, 2020.doi:10.4230/LIPIcs.ISAAC.2020.46. [BKL12] Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. Optimal mechanisms for selling information. InACM Conf. Electronic Commerc...
-
[5]
[BV25] Isaac Baley and Laura L
doi:10.1145/2229012.2229024. [BV25] Isaac Baley and Laura L. Veldkamp.The Data Economy: Tools and Applications. Princeton University Press, 2025.doi:10.1515/9780691256740. [CCPV11] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a mono- tonesubmodularfunctionsubjecttoamatroidconstraint.SIAM Journal on Computing, 40(6):1740–1766, ...
-
[6]
Maximum coverage problem with group bud- get constraints and applications
[CK04] Chandra Chekuri and Amit Kumar. Maximum coverage problem with group bud- get constraints and applications. InInternational Workshop on Randomization and Approximation Techniques in Computer Science, pages 72–83. Springer, 2004.doi: doi.org/10.1007/978-3-540-27821-4_7. [CKK24] Xi Chen, Christian Kroer, and Rachitesh Kumar. The complexity of pacing f...
-
[7]
[CPY17] Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. The complexity of non-monotone markets.Journal of the ACM (JACM), 64(3):1–56, 2017.doi:10.1145/3064810. [CS06] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. InACM Conf. Electronic Commerce (EC), pages 82–90, 2006.doi:10.1145/1134 707.1134717. [CSVY06] Bruno Cod...
-
[8]
URL:https://dl.acm.org/doi/10.5555/1109557.110
-
[9]
[CT05] Thomas M. Cover and Joy A. Thomas. Entropy, relative entropy, and mutual in- formation. InElements of Information Theory, pages 13–55. Wiley, 2005.doi: 10.1002/047174882X.ch2. [CT09] Xi Chen and Shang-Hua Teng. Spending is not easier than trading: On the computa- tional equivalence of Fisher and Arrow-Debreu equilibria. InIntl. Symp. Algorithms and...
-
[10]
Journal of the ACM (JACM) , volume =
doi:10.1145/1411509.1411512. [DV12] Shahar Dobzinski and Jan Vondrák. From query complexity to computational com- plexity. InSymp. Theory of Computing (STOC), pages 1107–1116, 2012.doi: 10.1145/2213977.2214076. 23 [EG59] Edmund Eisenberg and David Gale. Consensus of subjective probabilities: The pari- mutuel method.The Annals of Mathematical Statistics, 3...
-
[11]
URL: http://www.jstor.org/stable/2237130. [Eis61] Edmund Eisenberg. Aggregation of utility functions.Management Sci., 7(4):337–350, 1961.doi:10.1287/mnsc.7.4.337. [Fei98] Uriel Feige. A threshold oflnnfor approximating set cover.J. ACM, 45(4):634–652, 1998.doi:10.1145/285055.285059. [FGL23] Simon Finster, Paul Goldberg, and Edwin Lock. Substitutes markets...
-
[12]
[FMMO24] Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. Optimal and differentially private data acquisition: Central and local mechanisms.Operations Research, 72(3):1105–1123, 2024.doi:10.1287/opre.2022.0014. [FNS11] MoranFeldman, JosephNaor, andRoySchwartz. Aunifiedcontinuousgreedyalgorithm for submodular maximization. InSymp. Fo...
-
[13]
Approximating Nash social welfare by matching and local search
[GHL+23] JugalGarg, EdinHusić, WenzhengLi, LászlóAVégh, andJanVondrák. Approximating Nash social welfare by matching and local search. InSymp. Theory of Computing (STOC), pages 1298–1310, 2023.doi:10.1145/3564246.3585255. 24 [GMVY17] Jugal Garg, Ruta Mehta, Vijay V. Vazirani, and Sadra Yazdanbod. Settling the com- plexity of Leontief and PLC exchange mark...
-
[14]
[HK12] Anna Huber and Vladimir Kolmogorov
URL:https://proceedings.mlr.press/v235/hossain24a.ht ml. [HK12] Anna Huber and Vladimir Kolmogorov. Towards minimizingk-submodular functions. InInternational symposium on combinatorial optimization, pages 451–462. Springer, 2012.doi:10.1007/978-3-642-32147-4_40. [HS16] Johannes Hörner and Andrzej Skrzypacz. Selling information.Journal of Political Economy...
-
[15]
[Ich21] Shota Ichihashi. Competing data intermediaries.The RAND Journal of Economics, 52(3):515–537, 2021.doi:10.1111/1756-2171.12382. [ITY16] Satoru Iwata, Shin-ichi Tanigawa, and Yuichi Yoshida. Improved approximation algo- rithmsfork-submodularfunctionmaximization. InSymp. Discrete Algorithms (SODA), page 404–413, 2016.doi:10.1137/1.9781611974331.ch30....
-
[16]
Lipton, Evangelos Markakis, and Aranyak Mehta
[KLMM08] Subhash Khot, Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. Inapprox- imability results for combinatorial auctions with submodular utility functions.Algo- rithmica, 52:3–18, 2008.doi:10.1007/s00453-007-9105-7. [MDJM21] Sameer Mehta, Milind Dawande, Ganesh Janakiraman, and Vijay Mookerjee. How to sell a data set? pricing policies for d...
-
[17]
Michael Ostrovsky and Michael Schwarz
25 [Mye81] Roger B Myerson. Optimal auction design.Mathematics of operations research, 6(1):58– 73, 1981.doi:10.1287/moor.6.1.58. [Nul26] Jake Nulty. Top 15 data marketplaces of 2026: Best platforms ranked,
-
[18]
URL:https://brightdata.com/blog/web-data/best-data-marketpla ces
Accessed: 2026-01-28. URL:https://brightdata.com/blog/web-data/best-data-marketpla ces. [NVX14] Kobbi Nissim, Salil Vadhan, and David Xiao. Redrawing the boundaries on purchasing data from privacy-sensitive individuals. InSymp. Innovations in Theoret. Computer Science (ITCS), pages 411–422,
work page 2026
-
[19]
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin
[NWF78] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of ap- proximations for maximizing submodular set functions—i.Mathematical programming, 14(1):265–294, 1978.doi:10.1007/BF01588971. [Orl10] James Orlin. Improved algorithms for computing Fisher’s market clearing prices. In Symp. Theory of Computing (STOC), pages 291–300, 201...
-
[20]
[ST20] Grant Schoenebeck and Biaoshuai Tao. Influence maximization on undirected graphs: Toward closing the(1−1/e)gap.ACM Transactions on Economics and Computation, 8:22:1–22:36, 2020.doi:10.1145/3417748. [Sug26] Suger.io Documentation. Snowflake marketplace product pricing plans,
-
[21]
URL:https://doc.suger.io/snowflake-marketplace/pricing_plans/
Accessed: 2026-01-28. URL:https://doc.suger.io/snowflake-marketplace/pricing_plans/. [Var09] Hal R Varian. Economic aspects of personal privacy. InInternet Policy and Economics: Challenges and Perspectives, pages 101–109. Springer,
work page 2026
-
[22]
doi:10.1137/140978296. [Vel23] Laura Veldkamp. Valuing data as an asset.Review of Finance, 27(5):1545–1562,
-
[23]
doi:10.1093/rof/rfac073. 26 [vN28] J. von Neumann. Zur theorie der gesellschaftsspiele.Mathematische Annalen, 100(1):295–320, 1928.doi:10.1007/BF01448847. [Von08] Jan Vondrák. Optimal approximation for the submodular welfare problem in the value oracle model. InSymp. Theory of Computing (STOC), pages 67–74, 2008.doi:10.114 5/1374376.1374389. [vS34] Heinri...
-
[24]
doi:10.2307/2224643. [VSZ10] B. Von Stengel and S. Zamir. Leadership games with convex strategy sets.Games and Economic Behavior, 69(2):446–457, 2010.doi:10.1016/j.geb.2009.11.008. [WZ16] Justin Ward and Stanislav Zivny. Maximizingk-submodular functions and beyond. ACM Transactions on Algorithms (TALG), 12(4):1–26, 2016.doi:10.1145/2850419. 27
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.