pith. sign in

arxiv: 2602.13897 · v2 · submitted 2026-02-14 · 💻 cs.GT

Revenue-Optimal Pricing for Budget-Constrained Buyers in Data Markets

Pith reviewed 2026-05-15 21:59 UTC · model grok-4.3

classification 💻 cs.GT
keywords data marketsrevenue maximizationbudget constraintsAPX-hardnessapproximation algorithmsquasi-linear utilitiesfractional allocationsonline algorithms
0
0 comments X

The pith

Revenue maximization for pricing datasets to budget-constrained buyers is APX-hard, yet constant-factor approximations exist for online and offline cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies pricing multiple datasets to maximize total revenue when buyers have fixed budgets and want to improve prediction accuracy by acquiring data. Buyers may purchase any fraction of a dataset and pay proportionally, choosing bundles to maximize their quasi-linear utility. Although competitive equilibrium yields optimal revenue for ordinary rivalrous goods, the data setting makes exact revenue maximization APX-hard. The authors give a 2-approximation algorithm when datasets arrive one by one and a (1-1/e)^{-1}-approximation when all datasets are known in advance.

Core claim

In a data market offering multiple datasets, each buyer has a budget and a prediction task whose accuracy improves with acquired data fractions; buyers select the utility-maximizing fractional bundle under given prices. Revenue maximization over price vectors is APX-hard. A 2-approximation algorithm computes prices for the online arrival of datasets, while a (1-1/e)^{-1}-approximation algorithm computes prices when the full collection of datasets is known upfront.

What carries the argument

The fractional purchase model in which each buyer can acquire any fraction of each dataset and pays proportionally, optimizing a quasi-linear utility subject to a hard budget constraint.

If this is right

  • A seller facing sequential dataset arrivals can guarantee at least half the revenue of an optimal price vector.
  • When all datasets are known, the seller can guarantee revenue at least 1-1/e of the optimum.
  • Exact optimal prices cannot be computed in polynomial time unless P=NP.
  • The fractional model allows constant-factor revenue guarantees that are unavailable under indivisible purchases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Market platforms could implement the online algorithm to set prices dynamically as new datasets become available without sacrificing too much revenue.
  • The hardness result suggests that future work should identify natural restrictions on buyer valuations that restore polynomial-time solvability.
  • Fractional sales of data may be more practical than whole-dataset sales because they enable the reported approximation ratios.

Load-bearing premise

Buyers always select the bundle of data fractions that maximizes their quasi-linear utility given the posted prices and their budgets.

What would settle it

A polynomial-time algorithm that returns prices yielding revenue strictly larger than (1-1/e)^{-1} times the optimum on every instance, or a family of instances showing that no polynomial-time algorithm can guarantee better than the stated ratios.

Figures

Figures reproduced from arXiv: 2602.13897 by Bhaskar Ray Chaudhury, Eklavya Sharma, Jiaxin Song, Jugal Garg.

Figure 1
Figure 1. Figure 1: Prices for CE and SE of Example 1, where the prices are in red 2.2 Buyer Behavior and Structured Solutions While the non-rivalry makes revenue maximization computationally harder than the rivalrous set￾ting, it still offers a very clean closed-form characterization of the optimal revenue as function of the prices (unlike the rivalrous setting). To derive it, we must first understand how buyers behave. Afte… view at source ↗
Figure 2
Figure 2. Figure 2: Example where the deterministic greedy algorithm is suboptimal over all [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The values of f S ∩ Ua S ∩ Ub ∅ {b (1)} {b (2)} ∅ 0 1 4 {a (1)} 1 1 4 {a (2)} 4 5 4 {a (1), a(2)} ✗ 5 4 [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
read the original abstract

We study revenue-optimal pricing in data markets with rational, budget-constrained buyers. Such a market offers multiple datasets for sale, and buyers aim to improve the accuracy of their prediction tasks by acquiring data bundles. The market's objective is to price datasets to maximize total revenue, considering that buyers with quasi-linear utilities choose their bundles optimally under budget constraints. We allow the buyers to purchase fractions of datasets, and the amount they pay is proportional to the fraction they receive. Although competitive equilibrium gives revenue-optimal pricing in rivalrous markets with quasi-linear buyers, we show that revenue maximization in data markets is APX-hard. Despite the hardness, we design a 2-approximation algorithm when datasets arrive online, and a $(1-1/e)^{-1}$-approximation algorithm for the offline setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript studies revenue-optimal pricing in data markets with multiple datasets offered to budget-constrained buyers who have quasi-linear utilities and can purchase arbitrary fractions of datasets with proportional payments. It shows that revenue maximization is APX-hard, unlike in rivalrous markets where competitive equilibrium is optimal. The paper provides a 2-approximation algorithm for the online setting and a (1-1/e)^{-1}-approximation for the offline setting.

Significance. If the hardness result and approximation algorithms hold, the paper makes a clear contribution to algorithmic game theory by identifying why data markets differ computationally from standard rivalrous-good settings and by supplying explicit, usable approximation guarantees. The online 2-approximation and offline (1-1/e)^{-1} bound are concrete strengths that could inform practical data-market design.

minor comments (1)
  1. [Abstract] Abstract: the phrase 'although competitive equilibrium gives revenue-optimal pricing in rivalrous markets' would benefit from a one-sentence reminder of the precise conditions (quasi-linear utilities, no budget constraints) under which that optimality holds, to sharpen the contrast with the data-market model.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate summary of our results on revenue-optimal pricing for budget-constrained buyers in data markets and for the positive recommendation of minor revision. The referee correctly identifies the APX-hardness result and the 2-approximation (online) and (1-1/e)^{-1}-approximation (offline) algorithms as the core contributions.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core results—APX-hardness of revenue maximization and the stated approximation algorithms (2-approx online, (1-1/e)^{-1} offline)—are derived algorithmically from the explicit fractional quasi-linear utility model with budget constraints. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain. The reference to competitive equilibrium optimality in rivalrous markets is a standard external fact, not load-bearing self-citation. The fractional purchase assumption is stated upfront as part of the model rather than smuggled in. The derivation is self-contained against the model inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions from mechanism design rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Buyers have quasi-linear utilities and select optimal bundles under budget constraints
    Stated directly in the abstract as the buyer model.
  • domain assumption Fractional purchases of datasets are allowed with payments proportional to the fraction received
    Enables the continuous allocation model used for hardness and approximations.

pith-pipeline@v0.9.0 · 5439 in / 1280 out tokens · 32068 ms · 2026-05-15T21:59:16.257370+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    On the theoretical foundations of data exchange economies

    [ACGM25] Hannaneh Akrami, Bhaskar Ray Chaudhury, Jugal Garg, and Aniket Murhekar. On the theoretical foundations of data exchange economies. InACM Conf. Economics and Computation (EC), pages 444–444, 2025.doi:10.1145/3736252.3742566. [Acu22] Acumen Research. Big data market size: Global industry, share, analysis, trends and forecast 2022 - 2030,

  2. [2]

    , author Debreu, G

    URL:https://www.acumenresearchandconsulting.com/ big-data-market. [AD54] Kenneth J Arrow and Gerard Debreu. Existence of an equilibrium for a competitive economy.Econometrica, 22(3):265–290, 1954.doi:10.2307/1907353. [ADHR24] Anish Agarwal, Munther A. Dahleh, Thibaut Horel, and Maryann Rui. Towards data auctions with externalities.Games Econ. Behav., 148:...

  3. [3]

    Submodular max-SAT

    [AGR11] Yossi Azar, Iftah Gamzu, and Ran Roth. Submodular max-SAT. InEuropean Symp. Algorithms (ESA), pages 323–334. Springer, 2011.doi:10.1007/978-3-642-23719-5 _28. [AP86] Anat R Admati and Paul Pfleiderer. A monopolistic market for information.Journal of Economic Theory, 39(2):400–438, 1986.doi:10.1016/0022-0531(86)90052-9. [AP90] Anat R Admati and Pau...

  4. [4]

    Partial function extension with applications to learning and property testing

    [BK20] Umang Bhaskar and Gunjan Kumar. Partial function extension with applications to learning and property testing. InIntl. Symp. Algorithms and Computation (ISAAC), volume 181, 2020.doi:10.4230/LIPIcs.ISAAC.2020.46. [BKL12] Moshe Babaioff, Robert Kleinberg, and Renato Paes Leme. Optimal mechanisms for selling information. InACM Conf. Electronic Commerc...

  5. [5]

    [BV25] Isaac Baley and Laura L

    doi:10.1145/2229012.2229024. [BV25] Isaac Baley and Laura L. Veldkamp.The Data Economy: Tools and Applications. Princeton University Press, 2025.doi:10.1515/9780691256740. [CCPV11] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a mono- tonesubmodularfunctionsubjecttoamatroidconstraint.SIAM Journal on Computing, 40(6):1740–1766, ...

  6. [6]

    Maximum coverage problem with group bud- get constraints and applications

    [CK04] Chandra Chekuri and Amit Kumar. Maximum coverage problem with group bud- get constraints and applications. InInternational Workshop on Randomization and Approximation Techniques in Computer Science, pages 72–83. Springer, 2004.doi: doi.org/10.1007/978-3-540-27821-4_7. [CKK24] Xi Chen, Christian Kroer, and Rachitesh Kumar. The complexity of pacing f...

  7. [7]

    The complexity of non-monotone markets.Journal of the ACM (JACM), 64(3):1–56, 2017.doi:10.1145/3064810

    [CPY17] Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. The complexity of non-monotone markets.Journal of the ACM (JACM), 64(3):1–56, 2017.doi:10.1145/3064810. [CS06] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. InACM Conf. Electronic Commerce (EC), pages 82–90, 2006.doi:10.1145/1134 707.1134717. [CSVY06] Bruno Cod...

  8. [8]

    URL:https://dl.acm.org/doi/10.5555/1109557.110

  9. [9]

    Cover and Joy A

    [CT05] Thomas M. Cover and Joy A. Thomas. Entropy, relative entropy, and mutual in- formation. InElements of Information Theory, pages 13–55. Wiley, 2005.doi: 10.1002/047174882X.ch2. [CT09] Xi Chen and Shang-Hua Teng. Spending is not easier than trading: On the computa- tional equivalence of Fisher and Arrow-Debreu equilibria. InIntl. Symp. Algorithms and...

  10. [10]

    Journal of the ACM (JACM) , volume =

    doi:10.1145/1411509.1411512. [DV12] Shahar Dobzinski and Jan Vondrák. From query complexity to computational com- plexity. InSymp. Theory of Computing (STOC), pages 1107–1116, 2012.doi: 10.1145/2213977.2214076. 23 [EG59] Edmund Eisenberg and David Gale. Consensus of subjective probabilities: The pari- mutuel method.The Annals of Mathematical Statistics, 3...

  11. [11]

    [Eis61] Edmund Eisenberg

    URL: http://www.jstor.org/stable/2237130. [Eis61] Edmund Eisenberg. Aggregation of utility functions.Management Sci., 7(4):337–350, 1961.doi:10.1287/mnsc.7.4.337. [Fei98] Uriel Feige. A threshold oflnnfor approximating set cover.J. ACM, 45(4):634–652, 1998.doi:10.1145/285055.285059. [FGL23] Simon Finster, Paul Goldberg, and Edwin Lock. Substitutes markets...

  12. [12]

    Optimal and differentially private data acquisition: Central and local mechanisms.Operations Research, 72(3):1105–1123, 2024.doi:10.1287/opre.2022.0014

    [FMMO24] Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. Optimal and differentially private data acquisition: Central and local mechanisms.Operations Research, 72(3):1105–1123, 2024.doi:10.1287/opre.2022.0014. [FNS11] MoranFeldman, JosephNaor, andRoySchwartz. Aunifiedcontinuousgreedyalgorithm for submodular maximization. InSymp. Fo...

  13. [13]

    Approximating Nash social welfare by matching and local search

    [GHL+23] JugalGarg, EdinHusić, WenzhengLi, LászlóAVégh, andJanVondrák. Approximating Nash social welfare by matching and local search. InSymp. Theory of Computing (STOC), pages 1298–1310, 2023.doi:10.1145/3564246.3585255. 24 [GMVY17] Jugal Garg, Ruta Mehta, Vijay V. Vazirani, and Sadra Yazdanbod. Settling the com- plexity of Leontief and PLC exchange mark...

  14. [14]

    [HK12] Anna Huber and Vladimir Kolmogorov

    URL:https://proceedings.mlr.press/v235/hossain24a.ht ml. [HK12] Anna Huber and Vladimir Kolmogorov. Towards minimizingk-submodular functions. InInternational symposium on combinatorial optimization, pages 451–462. Springer, 2012.doi:10.1007/978-3-642-32147-4_40. [HS16] Johannes Hörner and Andrzej Skrzypacz. Selling information.Journal of Political Economy...

  15. [15]

    Competing data intermediaries.The RAND Journal of Economics, 52(3):515–537, 2021.doi:10.1111/1756-2171.12382

    [Ich21] Shota Ichihashi. Competing data intermediaries.The RAND Journal of Economics, 52(3):515–537, 2021.doi:10.1111/1756-2171.12382. [ITY16] Satoru Iwata, Shin-ichi Tanigawa, and Yuichi Yoshida. Improved approximation algo- rithmsfork-submodularfunctionmaximization. InSymp. Discrete Algorithms (SODA), page 404–413, 2016.doi:10.1137/1.9781611974331.ch30....

  16. [16]

    Lipton, Evangelos Markakis, and Aranyak Mehta

    [KLMM08] Subhash Khot, Richard J. Lipton, Evangelos Markakis, and Aranyak Mehta. Inapprox- imability results for combinatorial auctions with submodular utility functions.Algo- rithmica, 52:3–18, 2008.doi:10.1007/s00453-007-9105-7. [MDJM21] Sameer Mehta, Milind Dawande, Ganesh Janakiraman, and Vijay Mookerjee. How to sell a data set? pricing policies for d...

  17. [17]

    Michael Ostrovsky and Michael Schwarz

    25 [Mye81] Roger B Myerson. Optimal auction design.Mathematics of operations research, 6(1):58– 73, 1981.doi:10.1287/moor.6.1.58. [Nul26] Jake Nulty. Top 15 data marketplaces of 2026: Best platforms ranked,

  18. [18]

    URL:https://brightdata.com/blog/web-data/best-data-marketpla ces

    Accessed: 2026-01-28. URL:https://brightdata.com/blog/web-data/best-data-marketpla ces. [NVX14] Kobbi Nissim, Salil Vadhan, and David Xiao. Redrawing the boundaries on purchasing data from privacy-sensitive individuals. InSymp. Innovations in Theoret. Computer Science (ITCS), pages 411–422,

  19. [19]

    Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin

    [NWF78] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of ap- proximations for maximizing submodular set functions—i.Mathematical programming, 14(1):265–294, 1978.doi:10.1007/BF01588971. [Orl10] James Orlin. Improved algorithms for computing Fisher’s market clearing prices. In Symp. Theory of Computing (STOC), pages 291–300, 201...

  20. [20]

    Influence maximization on undirected graphs: Toward closing the(1−1/e)gap.ACM Transactions on Economics and Computation, 8:22:1–22:36, 2020.doi:10.1145/3417748

    [ST20] Grant Schoenebeck and Biaoshuai Tao. Influence maximization on undirected graphs: Toward closing the(1−1/e)gap.ACM Transactions on Economics and Computation, 8:22:1–22:36, 2020.doi:10.1145/3417748. [Sug26] Suger.io Documentation. Snowflake marketplace product pricing plans,

  21. [21]

    URL:https://doc.suger.io/snowflake-marketplace/pricing_plans/

    Accessed: 2026-01-28. URL:https://doc.suger.io/snowflake-marketplace/pricing_plans/. [Var09] Hal R Varian. Economic aspects of personal privacy. InInternet Policy and Economics: Challenges and Perspectives, pages 101–109. Springer,

  22. [22]

    [Vel23] Laura Veldkamp

    doi:10.1137/140978296. [Vel23] Laura Veldkamp. Valuing data as an asset.Review of Finance, 27(5):1545–1562,

  23. [23]

    26 [vN28] J

    doi:10.1093/rof/rfac073. 26 [vN28] J. von Neumann. Zur theorie der gesellschaftsspiele.Mathematische Annalen, 100(1):295–320, 1928.doi:10.1007/BF01448847. [Von08] Jan Vondrák. Optimal approximation for the submodular welfare problem in the value oracle model. InSymp. Theory of Computing (STOC), pages 67–74, 2008.doi:10.114 5/1374376.1374389. [vS34] Heinri...

  24. [24]

    [VSZ10] B

    doi:10.2307/2224643. [VSZ10] B. Von Stengel and S. Zamir. Leadership games with convex strategy sets.Games and Economic Behavior, 69(2):446–457, 2010.doi:10.1016/j.geb.2009.11.008. [WZ16] Justin Ward and Stanislav Zivny. Maximizingk-submodular functions and beyond. ACM Transactions on Algorithms (TALG), 12(4):1–26, 2016.doi:10.1145/2850419. 27