pith. sign in

arxiv: 2604.12139 · v1 · submitted 2026-04-13 · 🧮 math.OC

Estimating Price Elasticity Matrices

Pith reviewed 2026-05-10 14:48 UTC · model grok-4.3

classification 🧮 math.OC
keywords price elasticity matrixdemand estimationbi-convex optimizationfactor modelmaximum likelihoodoptimal pricinggradient ascent
0
0 comments X

The pith

A diagonal-plus-low-rank structure turns price elasticity matrix estimation into a bi-convex likelihood maximization problem solvable by gradient ascent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models demand response to prices via an elasticity matrix that linearly maps logarithmic price changes to logarithmic demand changes. From observed prices and integer demands, direct estimation is ill-posed, so the authors impose a diagonal-plus-low-rank factor structure that regularizes the matrix and makes maximum-likelihood estimation bi-convex. They compare three local solvers—alternating maximization over convex subproblems, gradient ascent with efficient gradients, and general nonlinear programming—finding that gradient ascent is substantially faster while all reach the same solutions. On synthetic and real data the likelihood surface is similar; on synthetic data the hyper-parameters that maximize likelihood also maximize realized profit when the fitted matrix is used for optimal pricing.

Core claim

The elasticity matrix relating log-price changes to log-demand changes is estimated by maximizing the likelihood of observed prices and integer demands under the modeling assumption that the matrix is diagonal plus low-rank. This assumption renders the likelihood function bi-convex, so that it is convex in each block of variables when the other block is held fixed. Three algorithms are developed and benchmarked: alternating maximization that solves a sequence of convex problems, gradient ascent that exploits closed-form gradient expressions, and a general-purpose nonlinear solver. Gradient ascent is markedly faster on the tested instances. Likelihood values are reported for varying hyper-par

What carries the argument

The diagonal-plus-low-rank elasticity matrix, whose low-rank factor component captures cross-product price effects and whose diagonal captures own-price effects; this decomposition renders the maximum-likelihood objective bi-convex.

If this is right

  • Gradient ascent supplies the fastest practical method for recovering the matrix from price-demand observations.
  • Hyper-parameter selection by maximum likelihood coincides with selection by downstream pricing profit on synthetic instances.
  • The same modeling and solution pipeline produces comparable likelihood surfaces on both synthetic and real retail data.
  • Open-source Python implementations of all three solvers are supplied for direct use on new datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The alignment of likelihood and profit maxima suggests the low-rank structure captures economically relevant demand interactions that matter for revenue optimization.
  • The same factor-model regularization could be applied to other linear response estimation problems where cross-effects are expected to be low-dimensional.
  • If the low-rank rank is chosen by cross-validation on held-out demand predictions, the resulting matrix may generalize better to new price vectors than an unregularized estimate.

Load-bearing premise

The true elasticity matrix for the observed products is well approximated by a diagonal-plus-low-rank structure.

What would settle it

On synthetic data generated from a full-rank elasticity matrix, the hyper-parameters that maximize the fitted likelihood fail to maximize the profit obtained by using the estimate for optimal pricing.

Figures

Figures reproduced from arXiv: 2604.12139 by Maximilian Schaller, Stephen Boyd.

Figure 1
Figure 1. Figure 1: Alternating maximization with CVXPY. The dimensions n, r, N, and the data D, Pitilde, lam are given. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Gradient ascent in Python. The initial value of X, the Python functions g and grad, and the tolerances eps rel and eps abs are given. 1 import cvxpy as cp 2 3 # variables 4 Etilde = cp.Variable((n, n + 1)) 5 B = cp.Variable((n, r)) 6 C = cp.Variable((n, r)) 7 s = cp.Variable(n) 8 9 # objective, constraint, and problem 10 f = cp.sum(cp.multiply(D, Etilde @ Pitilde) - cp.exp(Etilde @ Pitilde)) / N 11 obj = c… view at source ↗
Figure 3
Figure 3. Figure 3: Nonlinear programming with CVXPY. The dimensions n, r, N, and the data D, Pitilde, lam are given. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Log-likelihood for varying r and λ, with synthetic data. Ultimately, we set d nom,syn = 1 so log d nom,syn = 0 and the Poisson rates are λ (j) = exp(E synπ (j) ), j = 1, . . . , N. We then generate the columns d (j) of D from Poisson distributions with respective rates λ (j) . We generate the entries of the cost vector c IID from [0.8, 1.2]. Results. For n = 100, N = 200, and r syn = 10, we compare the tim… view at source ↗
Figure 5
Figure 5. Figure 5: Estimated elasticity matrix E⋆ for λ = 0.1 and r = 10 and true (syn￾thetic) elasticity matrix Esyn . method recovers the true elasticity matrix, up to slightly shrunk cross-elasticities due to regularization. We also compute the cross-validated pricing performance (9), i.e., the simulated profit after solving the optimal pricing problem (8) with our elasticity estimate. We take π min = log(0.8)1 and π max … view at source ↗
Figure 6
Figure 6. Figure 6: Pricing performance for varying r and λ, with synthetic data. 2 4 6 8 10 12 14 16 18 r −15300 −15250 −15200 −15150 −15100 Log-likelihood λ = 1.0 (a) Log-likelihood for varying r at λ = 1.0. 10−3 10−2 10−1 100 101 λ −16500 −16000 −15500 Log-likelihood r = 14 (b) Log-likelihood for varying λ at r = 14 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Log-likelihood for varying r and λ, with real data. Results. We consider r ∈ {2, 4, . . . , 18} and λ ∈ {10−3 , 10−2 , . . . , 101}. The cross-validated log-likelihood (7) is shown for K = 5 folds in figure 7. We obtain the highest log-likelihood at r = 14 and λ = 1.0. The median relative error of the predicted (mean) demand with respect to the true demand, over all products and time steps, is 34% (cross-v… view at source ↗
read the original abstract

The relationship between demand and prices of a set of products can be modeled as a linear mapping from logarithmic price changes to logarithmic changes in demand. We consider the problem of estimating the coefficient matrix of this mapping, the elasticity matrix, based on observed data consisting of real-valued prices and integer-valued demands. We regularize the estimation problem by imposing a factor model structure, i.e., that the elasticity matrix is diagonal plus low-rank, similar to factor models used for financial returns. Maximizing the likelihood of observations of this model is a bi-convex problem, meaning that there is a partition of the variables in which it is convex in each set when the other is fixed. We propose and compare three methods for finding a locally optimal estimate. The first is based on alternating maximization, and involves solving a sequence of convex problems. The second method exploits efficient gradient computations in a gradient ascent method. The final method is to use a general purpose nonlinear programming method. While all methods give the same result on numerical examples, the gradient ascent method is substantially faster, due to its efficient gradient evaluations. We report the likelihood with different hyper-parameters for synthetic and real-world data, with similar results. For synthetic data, we also report the realized profit when using the elasticity estimate for optimal pricing, which is maximized for the same set of hyper-parameters that also maximizes the likelihood. This paper is accompanied by easy to use open source Python code for fitting elasticity matrices to observed data, using our three numerical methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper models the relationship between logarithmic price changes and logarithmic demand changes via an elasticity matrix that is assumed to be diagonal plus low-rank. It formulates maximum likelihood estimation as a bi-convex optimization problem and proposes three methods (alternating maximization, gradient ascent, and general NLP) to find local optima. The methods are compared on synthetic and real-world data, showing similar likelihoods, with gradient ascent being faster. Open Python code is provided. On synthetic data, hyper-parameters maximizing likelihood also maximize realized profit from optimal pricing.

Significance. This work offers a practical, regularized approach to estimating price elasticity matrices for revenue management, supported by bi-convex optimization theory and reproducible code. The comparison of methods and the observation that likelihood and profit optima align on synthetic data are useful, though the latter may be limited by the data generation process.

major comments (2)
  1. [synthetic data results] Synthetic data experiments: The paper reports that hyper-parameters maximizing the likelihood also maximize realized profit when using the estimate for optimal pricing. However, if the synthetic data is generated from the same diagonal-plus-low-rank model (standard practice), this alignment is expected by construction and does not independently validate the modeling choice or the estimator's robustness to misspecification. An additional experiment with misspecified synthetic data would strengthen the claim.
  2. [optimization methods] Optimization methods section: The manuscript relies on local optima from the three methods. While they agree on the examples, the bi-convexity does not guarantee global optimality, and the post-hoc selection of hyper-parameters based on downstream profit (even if it coincides with likelihood) introduces a risk that the reported performance is optimistic.
minor comments (2)
  1. [abstract] The abstract states that all methods give the same result on numerical examples, but it would be helpful to quantify the agreement (e.g., via parameter differences or likelihood values) in the main text.
  2. [conclusion] The paper mentions accompanying open source Python code; including the repository URL or installation instructions would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive recommendation. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [synthetic data results] Synthetic data experiments: The paper reports that hyper-parameters maximizing the likelihood also maximize realized profit when using the estimate for optimal pricing. However, if the synthetic data is generated from the same diagonal-plus-low-rank model (standard practice), this alignment is expected by construction and does not independently validate the modeling choice or the estimator's robustness to misspecification. An additional experiment with misspecified synthetic data would strengthen the claim.

    Authors: We agree that the observed alignment is expected under the well-specified data-generating process used in the current experiments. Our purpose was to verify internal consistency between the likelihood objective and downstream pricing performance when the model assumptions hold. To address the concern regarding misspecification, we will add a new synthetic experiment in which data is generated from a dense (non-diagonal-plus-low-rank) elasticity matrix. Performance of the estimator and the resulting pricing profit will be reported for this misspecified case in the revised manuscript. revision: yes

  2. Referee: [optimization methods] Optimization methods section: The manuscript relies on local optima from the three methods. While they agree on the examples, the bi-convexity does not guarantee global optimality, and the post-hoc selection of hyper-parameters based on downstream profit (even if it coincides with likelihood) introduces a risk that the reported performance is optimistic.

    Authors: We acknowledge that bi-convexity guarantees convexity only when one block of variables is fixed and does not ensure global optimality of the joint problem. In the revision we will add an explicit statement of this limitation. We will also emphasize that the three independent methods (alternating maximization, gradient ascent, and general-purpose NLP) converge to identical solutions across all reported numerical examples, providing empirical support for the quality of the attained local optima. Regarding hyper-parameter selection, the primary criterion is maximization of validation-set likelihood; the profit alignment is reported solely as a post-hoc observation on synthetic data. We will clarify this distinction in the text and note that real-world experiments rely exclusively on likelihood-based selection. revision: partial

Circularity Check

1 steps flagged

Synthetic profit alignment with likelihood is forced by data generation matching the model assumption

specific steps
  1. fitted input called prediction [Abstract]
    "For synthetic data, we also report the realized profit when using the elasticity estimate for optimal pricing, which is maximized for the same set of hyper-parameters that also maximizes the likelihood."

    Hyper-parameters are selected by maximizing likelihood on synthetic observations generated from the diagonal-plus-low-rank elasticity model. The subsequent profit computation uses the fitted elasticity matrix to set prices on the identical synthetic demands. Because the data-generating process matches the assumed structure, the hyper-parameter optimum for likelihood is guaranteed to produce the elasticity estimate that maximizes model-consistent profit, rendering the reported coincidence tautological rather than an external test.

full rationale

The core technical contribution (bi-convex likelihood maximization under diagonal-plus-low-rank structure, with three solvers) is self-contained and does not reduce to its inputs. However, the reported equivalence between likelihood-maximizing and profit-maximizing hyper-parameters on synthetic data is statistically forced once the data is generated from the same structured model; this is a fitted-input-called-prediction pattern rather than an independent check. Real-data results report only in-sample likelihood, providing no external validation of pricing utility. The overall derivation chain therefore contains one load-bearing circular element but remains mostly independent.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the elasticity matrix admits a diagonal-plus-low-rank decomposition and on the choice of hyper-parameters (rank and regularization strength) that are selected to maximize likelihood.

free parameters (2)
  • rank of low-rank factor
    Hyper-parameter chosen to control the complexity of cross-product interactions in the elasticity matrix.
  • regularization hyper-parameters
    Tuned to maximize the likelihood of the observed price-demand data.
axioms (1)
  • domain assumption The mapping from log-price changes to log-demand changes is linear and the coefficient matrix is diagonal plus low-rank.
    Imposed to regularize the otherwise under-determined estimation problem from finite price-demand observations.

pith-pipeline@v0.9.0 · 5556 in / 1384 out tokens · 45267 ms · 2026-05-10T14:48:40.565851+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Armstrong and J

    M. Armstrong and J. Vickers. Multiproduct pricing made simple. Journal of Political Economy , 126(4):1444--1471, 2018

  2. [2]

    Agrawal, R

    A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision , 5(1):42--60, 2018

  3. [3]

    A. Barten. Consumer demand functions under conditions of almost additive preferences. Econometrica , 32(1):1--38, 1964

  4. [4]

    S. Berry. Estimating discrete-choice models of product differentiation. The Rand Journal of Economics , 25(2):242--262, 1994

  5. [5]

    Bertsekas

    D. Bertsekas. Nonlinear Programming . Athena Scientific, Nashua, 1999

  6. [6]

    Burda, M

    M. Burda, M. Harding, and J. Hausman. A P oisson mixture model of discrete choice. Journal of Econometrics , 166(2):184--203, 2012

  7. [7]

    R. Byrd, J. Nocedal, and R. Waltz. KNITRO : A n integrated package for nonlinear optimization. In Large-Scale Nonlinear Optimization , pages 35--59. Springer, Boston, 2006

  8. [8]

    Boyd and L

    S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, Cambridge, 2004

  9. [9]

    Broda and D

    C. Broda and D. Weinstein. Globalization and the gains from variety. The Quarterly Journal of Economics , 121(2):541--585, 2006

  10. [10]

    Christensen, D

    L. Christensen, D. Jorgenson, and L. Lau. Transcendental logarithmic utility functions. The American Economic Review , 65(3):367--383, 1975

  11. [11]

    S. Coxe, S. West, and L. Aiken. The analysis of count data: A gentle introduction to P oisson regression and its alternatives. Journal of Personality Assessment , 91(2):121--136, 2009

  12. [12]

    Cederberg, W

    D. Cederberg, W. Zhang, P. Nobel, and S. Boyd. Disciplined nonlinear programming, 2025. Working paper, available at https://stanford.edu/ boyd/papers/dnlp.html

  13. [13]

    Diamond and S

    S. Diamond and S. Boyd. CVXPY : A P ython-embedded modeling language for convex optimization. Journal of Machine Learning Research , 17(83):1--5, 2016

  14. [14]

    Dongarra, J

    J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software (TOMS) , 16(1):1--17, 1990

  15. [15]

    Deaton and J

    A. Deaton and J. Muellbauer. An almost ideal demand system. The American Economic Review , 70(3):312--326, 1980

  16. [16]

    Deaton and J

    A. Deaton and J. Muellbauer. Economics and Consumer Behavior . Cambridge University Press, Cambridge, 1980

  17. [17]

    W. Feller. An Introduction to Probability Theory and Its Applications , volume 2. John Wiley & Sons, New York, 1991

  18. [18]

    Fama and K

    E. Fama and K. French. The cross-section of expected stock returns. The Journal of Finance , 47(2):427--465, 1992

  19. [19]

    Fama and K

    E. Fama and K. French. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics , 33(1):3--56, 1993

  20. [20]

    Fally and E

    T. Fally and E. Ligon. Consumer demand with price aggregators and low-rank cross-price effects, 2025. Working paper, available at https://fally.are.berkeley.edu/Papers/Aggregators.pdf

  21. [21]

    Ferreira, B

    K. Ferreira, B. Lee, and D. Simchi-Levi. Analytics for an online retailer: D emand forecasting and price optimization. Manufacturing & Service Operations Management , 18(1):69--88, 2016

  22. [22]

    Gallego and G

    G. Gallego and G. Van Ryzin. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science , 40(8):999--1020, 1994

  23. [23]

    Hayat and M

    M. Hayat and M. Higgins. Understanding P oisson regression. Journal of Nursing Education , 53(4):207--215, 2014

  24. [24]

    Hastie and D

    T. Hastie and D. Pregibon. Generalized linear models. In Statistical Models in S , pages 195--247. Routledge, New York, 2017

  25. [25]

    Kilts Center for Marketing, University of Chicago Booth School of Business

    James M. Kilts Center for Marketing, University of Chicago Booth School of Business . D ominick's F iner F oods dataset, 1997. Available at https://www.chicagobooth.edu/research/kilts/research-data/dominicks

  26. [26]

    P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing , pages 665--674, 2013

  27. [27]

    Johansson, M

    K. Johansson, M. Ogut, M. Pelger, T. Schmelzer, and S. Boyd. A simple method for predicting covariance matrices of financial returns. Foundations and Trends in Econometrics , 12(4):324--407, 2023

  28. [28]

    Kuhn and A

    H. Kuhn and A. Tucker. Nonlinear programming. In Traces and Emergence of Nonlinear Programming , pages 247--258. Springer, Basel, 2013

  29. [29]

    Liu and J

    D. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming , 45(1):503--528, 1989

  30. [30]

    Lewbel and K

    A. Lewbel and K. Pendakur. Tricks with H icks: T he EASI demand system. American Economic Review , 99(3):827--863, 2009

  31. [31]

    Lettau and M

    M. Lettau and M. Pelger. Estimating latent asset-pricing factors. Journal of Econometrics , 218(1):1--31, 2020

  32. [32]

    Lettau and M

    M. Lettau and M. Pelger. Factors that fit the time series and cross-section of stock returns. The Review of Financial Studies , 33(5):2274--2325, 2020

  33. [33]

    Mas-Colell, M

    A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory , volume 1. Oxford University Press, New York, 1995

  34. [34]

    The mosek optimizer api manual

    MOSEK ApS . The mosek optimizer api manual. version 11.0, 2025. Available at https://docs.mosek.com/latest/capi/index.html

  35. [35]

    Muellbauer

    J. Muellbauer. Aggregation, income distribution and consumer demand. The Review of Economic Studies , 42(4):525--543, 1975

  36. [36]

    Nocedal and S

    J. Nocedal and S. Wright. Numerical Optimization . Springer, New York, 2006

  37. [37]

    A Note on Optimal Product Pricing

    M. Schaller and S. Boyd. A note on optimal product pricing. arXiv preprint arXiv:2511.06156 , 2025

  38. [38]

    Silva and S

    J. Silva and S. Tenreyro. The log of gravity. The Review of Economics and Statistics , 88(4):641--658, 2006

  39. [39]

    H. Theil. The information approach to demand analysis. Econometrica , 33(1):67--87, 1965

  40. [40]

    Udell, C

    M. Udell, C. Horn, R. Zadeh, and S. Boyd. Generalized low rank models. Foundations and Trends in Machine Learning , 9(1):1--118, 2016

  41. [41]

    W \"a chter and L

    A. W \"a chter and L. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming , 106(1):25--57, 2006

  42. [42]

    R. Wilson. Nonlinear Pricing . Oxford University Press, New York, 1993