pith. sign in

arxiv: 2606.00425 · v1 · pith:QQYM6ZB4new · submitted 2026-05-29 · 📊 stat.ME · stat.ML

Empirical Likelihood with Generative AI

Pith reviewed 2026-06-28 20:57 UTC · model grok-4.3

classification 📊 stat.ME stat.ML
keywords empirical likelihoodgenerative AImoment conditionsprojection posteriorBernstein-von Mises theoremconsistencyBayesian nonparametricauxiliary data
0
0 comments X

The pith

Generative AI auxiliary data produces consistent projection posteriors in empirical likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Bayesian nonparametric framework for empirical likelihood that incorporates auxiliary data generated by modern AI models. This is done by exponentially tilting the empirical likelihood to accommodate prior information on the observables. Posterior inference is performed by projecting draws from a Dirichlet process onto the model defined by the moment conditions. The resulting projection posterior is shown to satisfy new Bernstein-von Mises theorems and consistency results under both vanishing and persistent prior regimes. The method is illustrated in an application to stock return prediction using overnight news headlines as auxiliary information.

Core claim

We establish a Bayesian formulation of empirical likelihood based on exponentially tilted weights that incorporates AI-generated auxiliary data, with inference obtained by projecting the Dirichlet process posterior onto the moment-restricted space; this projection posterior obeys Bernstein-von Mises and consistency theorems in vanishing-prior and persistent-prior regimes.

What carries the argument

Projection of Dirichlet process draws onto the moment-restricted model within an exponentially tilted empirical likelihood that absorbs AI auxiliary data

Load-bearing premise

The auxiliary data generated by the AI model supplies useful indirect regularization that does not materially violate the moment conditions or introduce unaccounted bias.

What would settle it

Demonstrating that the projection posterior is inconsistent or fails to satisfy the Bernstein-von Mises property in a setting where the AI-generated auxiliary data introduces bias in the moment conditions.

Figures

Figures reproduced from arXiv: 2606.00425 by Jiguang Li, Sid Kankanala, Veronika Rockova.

Figure 1
Figure 1. Figure 1: Posterior draws visualization in the over-identified linear IV simulation. [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Validation AUC across tuning parameters. Both panels report mean validation [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative implied mass and structural function recovery ( [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Posterior distribution of the ATE on subsequent annual earnings of a substantial [PITH_FULL_IMAGE:figures/full_fig_p060_4.png] view at source ↗
read the original abstract

Moment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a nonparametric Bayesian framework for empirical likelihood based on exponentially tilted weights. Inference is performed by projecting posterior draws from a Dirichlet process prior on the observables onto the moment-restricted model. New Bernstein-von Mises and consistency theorems are stated for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. The method is illustrated in an application to return prediction that incorporates auxiliary data generated by a generative AI model from overnight news headlines.

Significance. If the stated theorems hold under the maintained moment conditions, the framework supplies a computationally tractable route for incorporating synthetic auxiliary data as indirect regularization within moment-based semiparametric models, extending existing empirical-likelihood and Bayesian-nonparametric tools to settings where direct priors on parameters are difficult to elicit.

minor comments (2)
  1. [Abstract] Abstract: the claim that the procedure is 'naturally amenable to parallelization' is asserted without reference to the specific projection step or computational complexity; a short clarifying sentence would strengthen the summary.
  2. The application section would benefit from an explicit statement of the moment conditions used for the return-prediction exercise and how the AI-generated headlines enter the tilted weights.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary of the manuscript and the recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain consists of a Dirichlet process prior on observables, exponential tilting to enforce given moment conditions, followed by projection onto the restricted model, with new BvM and consistency theorems stated for the resulting posterior under vanishing- and persistent-prior regimes. None of the load-bearing steps reduce by construction to fitted quantities, self-definitions, or self-citation chains; the moment conditions and projection operator are taken as external inputs, and the theorems are presented as extensions of standard empirical-likelihood and Bayesian-nonparametric results without internal reduction to the paper's own fitted outputs or prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters or invented entities; the framework rests on standard properties of Dirichlet processes, empirical likelihood tilting, and moment condition models.

axioms (1)
  • standard math Standard properties of the Dirichlet process and projection onto moment-restricted spaces hold.
    Invoked implicitly for the projection posterior construction.

pith-pipeline@v0.9.1-grok · 5700 in / 1116 out tokens · 24927 ms · 2026-06-28T20:57:08.194261+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 5 canonical work pages · 2 internal anchors

  1. [1]

    Ackerberg, D. A., K. Caves, and G. Frazer (2015). Identification properties of recent production function estimators.Econometrica 83(6), 2411–2451

  2. [2]

    Angrist, J. D. and W. N. Evans (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size.American Economic Review 88(3), 450–477

  3. [3]

    Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte carlo evidence and an application to employment equations.The Review of Economic Studies 58(2), 277–297

  4. [4]

    Astfalck, L., D. Sen, S. Patra, E. Cripps, and D. Dunson (2026). Posterior projection for inference in constrained spaces. arXiv:1812.05741

  5. [5]

    Blundell, and A

    Banks, J., R. Blundell, and A. Lewbel (1997). Quadratic Engel curves and consumer demand.The Review of Economics and Statistics 79(4), 527–539

  6. [6]

    Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach.The Journal of Finance 48(4), 1231–1262

  7. [7]

    Levinsohn, and A

    Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econometrica 63(4), 841–890

  8. [8]

    Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models.Journal of Econometrics 87(1), 115–143

  9. [9]

    Browning, and C

    Blundell, R., M. Browning, and C. Meghir (1994). Consumer demand and the life-cycle allocation of household expenditures.The Review of Economic Studies 61(1), 57–80

  10. [10]

    Chen, and D

    Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric IV estimation of shape-invariant Engel curves.Econometrica 75(6), 1613–1669

  11. [11]

    Bøler, E. A., A. Moxnes, and K. H. Ulltveit-Moe (2015). R&D, international sourcing, and the joint impact on firm performance.American Economic Review 105(12), 3704– 3739

  12. [12]

    Shephard, and R

    Bornn, L., N. Shephard, and R. Solgi (2019). Moment conditions and Bayesian non- parametrics.Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy) 81(1), 5–43

  13. [13]

    Bybee, J. L. (2025). The ghost in the machine: Generating beliefs with large language models. Working paper, February 2025 version

  14. [14]

    Chakraborty, M. and S. Ghosal (2022). Rates and coverage for monotone densities using projection-posterior.Bernoulli 28(2), 1093–1119

  15. [15]

    Chamberlain, G. and G. W. Imbens (2003). Nonparametric applications of Bayesian inference.Journal of Business & Economic Statistics 21(1), 12–18. 33

  16. [16]

    Chen, M.-H. and J. G. Ibrahim (2003). Conjugate priors for generalized linear models. Statistica Sinica 13(2), 461–476

  17. [17]

    Chen, Y., B. T. Kelly, and D. Xiu (2022). Expected returns and large language models. SSRN working paper

  18. [18]

    Qin, and B

    Cheng, J., J. Qin, and B. Zhang (2009). Semiparametric estimation and inference for distributional and general treatment effects.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(4), 881–904

  19. [19]

    Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment effects. Econometrica 73(1), 245–261

  20. [20]

    Chib, S. and E. Greenberg (2010). Additive cubic spline regression with Dirichlet process mixture errors.Journal of Econometrics 156(2), 322–336

  21. [21]

    Shin, and A

    Chib, S., M. Shin, and A. Simoni (2018). Bayesian estimation and comparison of moment condition models.Journal of the American Statistical Association 113(524), 1656–1668

  22. [22]

    Shin, and A

    Chib, S., M. Shin, and A. Simoni (2022). Bayesian estimation and comparison of conditional moment models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(3), 740–764

  23. [23]

    Choi, J. and S. O’Hagan (2026). Supercharging bayesian inference with reliable ai- informed priors. arXiv:2605.09834

  24. [24]

    Diaconis, P. and D. Ylvisaker (1979). Conjugate priors for exponential families.The Annals of Statistics 7(2), 269–281

  25. [25]

    Doraszelski, U. and J. Jaumandreu (2013). R&D and productivity: Estimating en- dogenous productivity.The Review of Economic Studies 80(4), 1338–1383

  26. [26]

    Efron, B. (1981). Nonparametric standard errors and confidence intervals.Canadian Journal of Statistics 9(2), 139–158

  27. [27]

    Kakhbod, P

    Fedyk, A., A. Kakhbod, P. Li, and U. Malmendier (2024). AI and perception biases in investments: An experimental study. SSRN working paper

  28. [28]

    Ferguson, T. S. (1974). Prior distributions on spaces of probability measures.The Annals of Statistics 2(4), 615–629

  29. [29]

    Lyddon, and C

    Fong, E., S. Lyddon, and C. Holmes (2019). Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. InProceedings of the 36th Interna- tional Conference on Machine Learning, Volume 97 ofProceedings of Machine Learning Research, pp. 1952–1962. PMLR

  30. [30]

    Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50(4), 1029–1054. 34

  31. [31]

    Honor´ e, B. E. and M. Weidner (2025). Moment conditions for dynamic panel logit models with fixed effects.The Review of Economic Studies 92(5), 3112–3137

  32. [32]

    Stein, D

    Huang, D., N. Stein, D. B. Rubin, and S. C. Kou (2020). Catalytic prior distributions with application to generalized linear models.Proceedings of the National Academy of Sciences 117(22), 12004–12010

  33. [33]

    Imbens, G. W. (2002). Generalized method of moments and empirical likelihood. Journal of Business & Economic Statistics 20(4), 493–506

  34. [34]

    Imbens, G. W., R. H. Spady, and P. Johnson (1998). Information-theoretic approaches to inference in moment condition models.Econometrica 66(2), 333–357

  35. [35]

    Ishwaran, H. and M. Zarepour (2002). Exact and approximate sum representations for the Dirichlet process.The Canadian Journal of Statistics / La Revue Canadienne de Statistique 30(2), 269–283

  36. [36]

    Kankanala, S. (2025). Generalized Bayes in conditional moment restriction models. arXiv preprint arXiv:2510.01036

  37. [37]

    Kim, E., S. N. MacEachern, and M. Peruggia (2026). Regularized exponentially tilted empirical likelihood for Bayesian inference. arXiv preprint arXiv:2312.17015

  38. [38]

    Kitamura, Y. and T. Otsu (2011). Bayesian analysis of moment restriction models using nonparametric priors. Unpublished manuscript, Department of Economics, Yale University

  39. [39]

    Kitamura, Y. and M. Stutzer (1997). An information-theoretic alternative to general- ized method of moments estimation.Econometrica 65(4), 861–874

  40. [40]

    Koenker, R. and G. Bassett Jr (1978). Regression quantiles.Econometrica 46(1), 33–50

  41. [41]

    Lazar, N. A. (2003). Bayesian empirical likelihood.Biometrika 90(2), 319–326

  42. [42]

    Levinsohn, J. and A. Petrin (2003). Estimating production functions using inputs to control for unobservables.The Review of Economic Studies 70(2), 317–341

  43. [43]

    Liao, Y. and W. Jiang (2011). Posterior consistency of nonparametric conditional moment restricted models.The Annals of Statistics, 3003–3031

  44. [44]

    Lin, L. and D. B. Dunson (2014). Bayesian monotone regression using Gaussian process projection.Biometrika 101(2), 303–317

  45. [45]

    Lopez-Lira, A. and Y. Tang (2023). Can ChatGPT forecast stock price movements? return predictability and large language models. SSRN working paper

  46. [46]

    Lyddon, S. P., C. C. Holmes, and S. G. Walker (2019). General Bayesian updating and the loss-likelihood bootstrap.Biometrika 106(2), 465–478. 35

  47. [47]

    Manning, B. S., K. Zhu, and J. J. Horton (2024). Automated social science: Language models as scientist and subjects. NBER Working Paper 32381, National Bureau of Economic Research

  48. [48]

    Newey, W. K. and J. L. Powell (2003). Instrumental variable estimation of nonpara- metric models.Econometrica 71(5), 1565–1578

  49. [49]

    Newey, W. K. and R. J. Smith (2004). Higher order properties of GMM and generalized empirical likelihood estimators.Econometrica 72(1), 219–255

  50. [50]

    Newton, M. A. (1991).The Weighted Likelihood Bootstrap and an Algorithm for Prepivoting. Ph. D. thesis, Department of Statistics, University of Washington, Seattle, WA

  51. [51]

    Newton, M. A. and A. E. Raftery (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society: Series B (Method- ological) 56(1), 3–26

  52. [52]

    O’Hagan, S. and V. Roˇ ckov´ a (2025). AI-powered Bayesian inference. arXiv preprint arXiv:2502.19231

  53. [53]

    Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single func- tional.Biometrika 75(2), 237–249

  54. [54]

    Owen, A. B. (2001).Empirical Likelihood. Chapman and Hall/CRC

  55. [55]

    Rubin, D. B. (1981). The Bayesian bootstrap.The Annals of Statistics 9(1), 130–134

  56. [56]

    Salton, G. and C. Buckley (1988). Term-weighting approaches in automatic text retrieval.Information Processing & Management 24(5), 513–523

  57. [57]

    Schennach, S. M. (2005). Bayesian exponentially tilted empirical likelihood. Biometrika 92(1), 31–46

  58. [58]

    Schennach, S. M. (2007). Point estimation with exponentially tilted empirical likeli- hood.The Annals of Statistics 35(2), 634–672

  59. [59]

    Sethuraman, J. (1994). A constructive definition of Dirichlet priors.Statistica Sinica 4(2), 639–650

  60. [60]

    Tang, R. and Y. Yang (2022). Bayesian inference for risk minimization via expo- nentially tilted empirical likelihood.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(4), 1257–1286

  61. [61]

    Theobald, C. M. (1974). Generalizations of mean square error applied to ridge regres- sion.Journal of the Royal Statistical Society: Series B (Methodological) 36(1), 103–106

  62. [62]

    Yiu, A., R. J. B. Goudie, and B. D. M. Tom (2020). Inference under un- equal probability sampling with the Bayesian exponentially tilted empirical likelihood. Biometrika 107(4), 857–873. 36

  63. [63]

    Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. K. Goel and A. Zellner (Eds.),Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. New York: Elsevier Science Publishers. 37 SUPPLEMENTARY MATERIALS A Moment Condition Models: Motivating Examples ...

  64. [64]

    The operator norm∥∇ 3 ηψ(¯η, θ)∥is uniformly bounded forη∈Λandθ∈ N

  65. [65]

    Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ)

    Forθ∈ N,∥λ(θ)∥ ≤C 1∥S(θ)∥for some constantC 1. Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ). By viewingψ(η, θ) as the log cumulant generating function ofg k(θ), we can upper bound the operator norm of the third derivative tensor as ∥∇3ψ(η, θ)∥= sup ∥u∥=∥v∥=∥w∥=1 | X k pk Y t∈{u,v,w...

  66. [66]

    Use only the supplied headlines and source codes

  67. [67]

    You are not asked to infer the realized future return exactly; instead, score the news tone a plausible market participant might perceive overnight

  68. [68]

    Most draws should be near zero

    Many nights are mixed or weakly informative. Most draws should be near zero. Extreme values should be rare and reserved for clearly strong catalysts

  69. [69]

    Administrative, exchange, filing, promotional, or routine press-release items are usually weaker evidence than independent reported news

  70. [70]

    Analyst rating / price-target changes are moderate evidence

  71. [71]

    Strong earnings/guidance surprises, major litigation/regulatory outcomes, financing stress, M&A, management shocks, outages, or clearly material product news can justify larger |z|

  72. [72]

    Draws should vary modestly around your central judgment: - more dispersion when the evidence is mixed or ambiguous - tighter draws when the catalyst is clear

  73. [73]

    GPT-ETEL (synthetic-label)

    Do not output explanations. E.3 Additional Experiment: Generating Synthetic News with Synthetic Labels In this manuscript, we primarily use generative AI in a conditional manner: given observed covariates, we prompt the model to generate synthetic labels. In principle, one could instead 58 Table 6:Test-set performance in overnight news prediction.Results ...