Empirical Likelihood with Generative AI
Pith reviewed 2026-06-28 20:57 UTC · model grok-4.3
The pith
Generative AI auxiliary data produces consistent projection posteriors in empirical likelihood.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish a Bayesian formulation of empirical likelihood based on exponentially tilted weights that incorporates AI-generated auxiliary data, with inference obtained by projecting the Dirichlet process posterior onto the moment-restricted space; this projection posterior obeys Bernstein-von Mises and consistency theorems in vanishing-prior and persistent-prior regimes.
What carries the argument
Projection of Dirichlet process draws onto the moment-restricted model within an exponentially tilted empirical likelihood that absorbs AI auxiliary data
Load-bearing premise
The auxiliary data generated by the AI model supplies useful indirect regularization that does not materially violate the moment conditions or introduce unaccounted bias.
What would settle it
Demonstrating that the projection posterior is inconsistent or fails to satisfy the Bernstein-von Mises property in a setting where the AI-generated auxiliary data introduces bias in the moment conditions.
Figures
read the original abstract
Moment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a nonparametric Bayesian framework for empirical likelihood based on exponentially tilted weights. Inference is performed by projecting posterior draws from a Dirichlet process prior on the observables onto the moment-restricted model. New Bernstein-von Mises and consistency theorems are stated for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. The method is illustrated in an application to return prediction that incorporates auxiliary data generated by a generative AI model from overnight news headlines.
Significance. If the stated theorems hold under the maintained moment conditions, the framework supplies a computationally tractable route for incorporating synthetic auxiliary data as indirect regularization within moment-based semiparametric models, extending existing empirical-likelihood and Bayesian-nonparametric tools to settings where direct priors on parameters are difficult to elicit.
minor comments (2)
- [Abstract] Abstract: the claim that the procedure is 'naturally amenable to parallelization' is asserted without reference to the specific projection step or computational complexity; a short clarifying sentence would strengthen the summary.
- The application section would benefit from an explicit statement of the moment conditions used for the return-prediction exercise and how the AI-generated headlines enter the tilted weights.
Simulated Author's Rebuttal
We thank the referee for the supportive summary of the manuscript and the recommendation of minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity detected
full rationale
The derivation chain consists of a Dirichlet process prior on observables, exponential tilting to enforce given moment conditions, followed by projection onto the restricted model, with new BvM and consistency theorems stated for the resulting posterior under vanishing- and persistent-prior regimes. None of the load-bearing steps reduce by construction to fitted quantities, self-definitions, or self-citation chains; the moment conditions and projection operator are taken as external inputs, and the theorems are presented as extensions of standard empirical-likelihood and Bayesian-nonparametric results without internal reduction to the paper's own fitted outputs or prior work by the same authors.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard properties of the Dirichlet process and projection onto moment-restricted spaces hold.
Reference graph
Works this paper leans on
-
[1]
Ackerberg, D. A., K. Caves, and G. Frazer (2015). Identification properties of recent production function estimators.Econometrica 83(6), 2411–2451
2015
-
[2]
Angrist, J. D. and W. N. Evans (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size.American Economic Review 88(3), 450–477
1998
-
[3]
Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte carlo evidence and an application to employment equations.The Review of Economic Studies 58(2), 277–297
1991
- [4]
-
[5]
Blundell, and A
Banks, J., R. Blundell, and A. Lewbel (1997). Quadratic Engel curves and consumer demand.The Review of Economics and Statistics 79(4), 527–539
1997
-
[6]
Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach.The Journal of Finance 48(4), 1231–1262
1993
-
[7]
Levinsohn, and A
Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econometrica 63(4), 841–890
1995
-
[8]
Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models.Journal of Econometrics 87(1), 115–143
1998
-
[9]
Browning, and C
Blundell, R., M. Browning, and C. Meghir (1994). Consumer demand and the life-cycle allocation of household expenditures.The Review of Economic Studies 61(1), 57–80
1994
-
[10]
Chen, and D
Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric IV estimation of shape-invariant Engel curves.Econometrica 75(6), 1613–1669
2007
-
[11]
Bøler, E. A., A. Moxnes, and K. H. Ulltveit-Moe (2015). R&D, international sourcing, and the joint impact on firm performance.American Economic Review 105(12), 3704– 3739
2015
-
[12]
Shephard, and R
Bornn, L., N. Shephard, and R. Solgi (2019). Moment conditions and Bayesian non- parametrics.Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy) 81(1), 5–43
2019
-
[13]
Bybee, J. L. (2025). The ghost in the machine: Generating beliefs with large language models. Working paper, February 2025 version
2025
-
[14]
Chakraborty, M. and S. Ghosal (2022). Rates and coverage for monotone densities using projection-posterior.Bernoulli 28(2), 1093–1119
2022
-
[15]
Chamberlain, G. and G. W. Imbens (2003). Nonparametric applications of Bayesian inference.Journal of Business & Economic Statistics 21(1), 12–18. 33
2003
-
[16]
Chen, M.-H. and J. G. Ibrahim (2003). Conjugate priors for generalized linear models. Statistica Sinica 13(2), 461–476
2003
-
[17]
Chen, Y., B. T. Kelly, and D. Xiu (2022). Expected returns and large language models. SSRN working paper
2022
-
[18]
Qin, and B
Cheng, J., J. Qin, and B. Zhang (2009). Semiparametric estimation and inference for distributional and general treatment effects.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(4), 881–904
2009
-
[19]
Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment effects. Econometrica 73(1), 245–261
2005
-
[20]
Chib, S. and E. Greenberg (2010). Additive cubic spline regression with Dirichlet process mixture errors.Journal of Econometrics 156(2), 322–336
2010
-
[21]
Shin, and A
Chib, S., M. Shin, and A. Simoni (2018). Bayesian estimation and comparison of moment condition models.Journal of the American Statistical Association 113(524), 1656–1668
2018
-
[22]
Shin, and A
Chib, S., M. Shin, and A. Simoni (2022). Bayesian estimation and comparison of conditional moment models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(3), 740–764
2022
-
[23]
Choi, J. and S. O’Hagan (2026). Supercharging bayesian inference with reliable ai- informed priors. arXiv:2605.09834
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[24]
Diaconis, P. and D. Ylvisaker (1979). Conjugate priors for exponential families.The Annals of Statistics 7(2), 269–281
1979
-
[25]
Doraszelski, U. and J. Jaumandreu (2013). R&D and productivity: Estimating en- dogenous productivity.The Review of Economic Studies 80(4), 1338–1383
2013
-
[26]
Efron, B. (1981). Nonparametric standard errors and confidence intervals.Canadian Journal of Statistics 9(2), 139–158
1981
-
[27]
Kakhbod, P
Fedyk, A., A. Kakhbod, P. Li, and U. Malmendier (2024). AI and perception biases in investments: An experimental study. SSRN working paper
2024
-
[28]
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures.The Annals of Statistics 2(4), 615–629
1974
-
[29]
Lyddon, and C
Fong, E., S. Lyddon, and C. Holmes (2019). Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. InProceedings of the 36th Interna- tional Conference on Machine Learning, Volume 97 ofProceedings of Machine Learning Research, pp. 1952–1962. PMLR
2019
-
[30]
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50(4), 1029–1054. 34
1982
-
[31]
Honor´ e, B. E. and M. Weidner (2025). Moment conditions for dynamic panel logit models with fixed effects.The Review of Economic Studies 92(5), 3112–3137
2025
-
[32]
Stein, D
Huang, D., N. Stein, D. B. Rubin, and S. C. Kou (2020). Catalytic prior distributions with application to generalized linear models.Proceedings of the National Academy of Sciences 117(22), 12004–12010
2020
-
[33]
Imbens, G. W. (2002). Generalized method of moments and empirical likelihood. Journal of Business & Economic Statistics 20(4), 493–506
2002
-
[34]
Imbens, G. W., R. H. Spady, and P. Johnson (1998). Information-theoretic approaches to inference in moment condition models.Econometrica 66(2), 333–357
1998
-
[35]
Ishwaran, H. and M. Zarepour (2002). Exact and approximate sum representations for the Dirichlet process.The Canadian Journal of Statistics / La Revue Canadienne de Statistique 30(2), 269–283
2002
- [36]
-
[37]
Kim, E., S. N. MacEachern, and M. Peruggia (2026). Regularized exponentially tilted empirical likelihood for Bayesian inference. arXiv preprint arXiv:2312.17015
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Kitamura, Y. and T. Otsu (2011). Bayesian analysis of moment restriction models using nonparametric priors. Unpublished manuscript, Department of Economics, Yale University
2011
-
[39]
Kitamura, Y. and M. Stutzer (1997). An information-theoretic alternative to general- ized method of moments estimation.Econometrica 65(4), 861–874
1997
-
[40]
Koenker, R. and G. Bassett Jr (1978). Regression quantiles.Econometrica 46(1), 33–50
1978
-
[41]
Lazar, N. A. (2003). Bayesian empirical likelihood.Biometrika 90(2), 319–326
2003
-
[42]
Levinsohn, J. and A. Petrin (2003). Estimating production functions using inputs to control for unobservables.The Review of Economic Studies 70(2), 317–341
2003
-
[43]
Liao, Y. and W. Jiang (2011). Posterior consistency of nonparametric conditional moment restricted models.The Annals of Statistics, 3003–3031
2011
-
[44]
Lin, L. and D. B. Dunson (2014). Bayesian monotone regression using Gaussian process projection.Biometrika 101(2), 303–317
2014
-
[45]
Lopez-Lira, A. and Y. Tang (2023). Can ChatGPT forecast stock price movements? return predictability and large language models. SSRN working paper
2023
-
[46]
Lyddon, S. P., C. C. Holmes, and S. G. Walker (2019). General Bayesian updating and the loss-likelihood bootstrap.Biometrika 106(2), 465–478. 35
2019
-
[47]
Manning, B. S., K. Zhu, and J. J. Horton (2024). Automated social science: Language models as scientist and subjects. NBER Working Paper 32381, National Bureau of Economic Research
2024
-
[48]
Newey, W. K. and J. L. Powell (2003). Instrumental variable estimation of nonpara- metric models.Econometrica 71(5), 1565–1578
2003
-
[49]
Newey, W. K. and R. J. Smith (2004). Higher order properties of GMM and generalized empirical likelihood estimators.Econometrica 72(1), 219–255
2004
-
[50]
Newton, M. A. (1991).The Weighted Likelihood Bootstrap and an Algorithm for Prepivoting. Ph. D. thesis, Department of Statistics, University of Washington, Seattle, WA
1991
-
[51]
Newton, M. A. and A. E. Raftery (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society: Series B (Method- ological) 56(1), 3–26
1994
- [52]
-
[53]
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single func- tional.Biometrika 75(2), 237–249
1988
-
[54]
Owen, A. B. (2001).Empirical Likelihood. Chapman and Hall/CRC
2001
-
[55]
Rubin, D. B. (1981). The Bayesian bootstrap.The Annals of Statistics 9(1), 130–134
1981
-
[56]
Salton, G. and C. Buckley (1988). Term-weighting approaches in automatic text retrieval.Information Processing & Management 24(5), 513–523
1988
-
[57]
Schennach, S. M. (2005). Bayesian exponentially tilted empirical likelihood. Biometrika 92(1), 31–46
2005
-
[58]
Schennach, S. M. (2007). Point estimation with exponentially tilted empirical likeli- hood.The Annals of Statistics 35(2), 634–672
2007
-
[59]
Sethuraman, J. (1994). A constructive definition of Dirichlet priors.Statistica Sinica 4(2), 639–650
1994
-
[60]
Tang, R. and Y. Yang (2022). Bayesian inference for risk minimization via expo- nentially tilted empirical likelihood.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(4), 1257–1286
2022
-
[61]
Theobald, C. M. (1974). Generalizations of mean square error applied to ridge regres- sion.Journal of the Royal Statistical Society: Series B (Methodological) 36(1), 103–106
1974
-
[62]
Yiu, A., R. J. B. Goudie, and B. D. M. Tom (2020). Inference under un- equal probability sampling with the Bayesian exponentially tilted empirical likelihood. Biometrika 107(4), 857–873. 36
2020
-
[63]
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. K. Goel and A. Zellner (Eds.),Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. New York: Elsevier Science Publishers. 37 SUPPLEMENTARY MATERIALS A Moment Condition Models: Motivating Examples ...
1986
-
[64]
The operator norm∥∇ 3 ηψ(¯η, θ)∥is uniformly bounded forη∈Λandθ∈ N
-
[65]
Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ)
Forθ∈ N,∥λ(θ)∥ ≤C 1∥S(θ)∥for some constantC 1. Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ). By viewingψ(η, θ) as the log cumulant generating function ofg k(θ), we can upper bound the operator norm of the third derivative tensor as ∥∇3ψ(η, θ)∥= sup ∥u∥=∥v∥=∥w∥=1 | X k pk Y t∈{u,v,w...
2025
-
[66]
Use only the supplied headlines and source codes
-
[67]
You are not asked to infer the realized future return exactly; instead, score the news tone a plausible market participant might perceive overnight
-
[68]
Most draws should be near zero
Many nights are mixed or weakly informative. Most draws should be near zero. Extreme values should be rare and reserved for clearly strong catalysts
-
[69]
Administrative, exchange, filing, promotional, or routine press-release items are usually weaker evidence than independent reported news
-
[70]
Analyst rating / price-target changes are moderate evidence
-
[71]
Strong earnings/guidance surprises, major litigation/regulatory outcomes, financing stress, M&A, management shocks, outages, or clearly material product news can justify larger |z|
-
[72]
Draws should vary modestly around your central judgment: - more dispersion when the evidence is mixed or ambiguous - tighter draws when the catalyst is clear
-
[73]
GPT-ETEL (synthetic-label)
Do not output explanations. E.3 Additional Experiment: Generating Synthetic News with Synthetic Labels In this manuscript, we primarily use generative AI in a conditional manner: given observed covariates, we prompt the model to generate synthetic labels. In principle, one could instead 58 Table 6:Test-set performance in overnight news prediction.Results ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.