Empirical Likelihood with Generative AI

Jiguang Li; Sid Kankanala; Veronika Rockova

arxiv: 2606.00425 · v1 · pith:QQYM6ZB4new · submitted 2026-05-29 · 📊 stat.ME · stat.ML

Empirical Likelihood with Generative AI

Jiguang Li , Sid Kankanala , Veronika Rockova This is my paper

Pith reviewed 2026-06-28 20:57 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords empirical likelihoodgenerative AImoment conditionsprojection posteriorBernstein-von Mises theoremconsistencyBayesian nonparametricauxiliary data

0 comments

The pith

Generative AI auxiliary data produces consistent projection posteriors in empirical likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Bayesian nonparametric framework for empirical likelihood that incorporates auxiliary data generated by modern AI models. This is done by exponentially tilting the empirical likelihood to accommodate prior information on the observables. Posterior inference is performed by projecting draws from a Dirichlet process onto the model defined by the moment conditions. The resulting projection posterior is shown to satisfy new Bernstein-von Mises theorems and consistency results under both vanishing and persistent prior regimes. The method is illustrated in an application to stock return prediction using overnight news headlines as auxiliary information.

Core claim

We establish a Bayesian formulation of empirical likelihood based on exponentially tilted weights that incorporates AI-generated auxiliary data, with inference obtained by projecting the Dirichlet process posterior onto the moment-restricted space; this projection posterior obeys Bernstein-von Mises and consistency theorems in vanishing-prior and persistent-prior regimes.

What carries the argument

Projection of Dirichlet process draws onto the moment-restricted model within an exponentially tilted empirical likelihood that absorbs AI auxiliary data

Load-bearing premise

The auxiliary data generated by the AI model supplies useful indirect regularization that does not materially violate the moment conditions or introduce unaccounted bias.

What would settle it

Demonstrating that the projection posterior is inconsistent or fails to satisfy the Bernstein-von Mises property in a setting where the AI-generated auxiliary data introduces bias in the moment conditions.

Figures

Figures reproduced from arXiv: 2606.00425 by Jiguang Li, Sid Kankanala, Veronika Rockova.

**Figure 2.** Figure 2: Validation AUC across tuning parameters. Both panels report mean validation [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative implied mass and structural function recovery ( [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗

**Figure 4.** Figure 4: Posterior distribution of the ATE on subsequent annual earnings of a substantial [PITH_FULL_IMAGE:figures/full_fig_p060_4.png] view at source ↗

read the original abstract

Moment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper folds generative AI auxiliary data into exponentially tilted empirical likelihood via Dirichlet process projection and claims new BvM plus consistency theorems under vanishing and persistent priors.

read the letter

The main contribution here is a nonparametric Bayesian empirical likelihood setup that lets you use AI-generated auxiliary data when direct priors on the parameters are awkward. They start with a Dirichlet process on the observables, apply exponential tilting to enforce the moments, then project the posterior draws onto the restricted model. This yields a parallelizable procedure and they state new Bernstein-von Mises and consistency results for both vanishing-prior and persistent-prior regimes.

The approach is sensible for settings where synthetic data can supply indirect regularization without needing to specify a parameter prior. The return-prediction application with overnight news headlines shows a concrete use case. The framework builds cleanly on existing empirical likelihood and Bayesian nonparametric tools, and the stress-test note correctly flags no internal circularity once the moment conditions are accepted.

The soft spots are the usual ones for an abstract-heavy read: the actual derivations, the precise assumptions on the AI data, and any post-hoc tuning choices are not visible, so it is impossible to judge how tight the theorems really are or whether the generated data introduces unaccounted bias into the moments. The application would also need clearer diagnostics on data quality and sensitivity.

This is for researchers working on moment-based inference, empirical likelihood, or Bayesian methods that incorporate machine-generated data. A reader in those areas would get a usable new tool to think about. It is worth sending to peer review; the combination is new enough and the claims specific enough that referees can usefully check the details.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a nonparametric Bayesian framework for empirical likelihood based on exponentially tilted weights. Inference is performed by projecting posterior draws from a Dirichlet process prior on the observables onto the moment-restricted model. New Bernstein-von Mises and consistency theorems are stated for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. The method is illustrated in an application to return prediction that incorporates auxiliary data generated by a generative AI model from overnight news headlines.

Significance. If the stated theorems hold under the maintained moment conditions, the framework supplies a computationally tractable route for incorporating synthetic auxiliary data as indirect regularization within moment-based semiparametric models, extending existing empirical-likelihood and Bayesian-nonparametric tools to settings where direct priors on parameters are difficult to elicit.

minor comments (2)

[Abstract] Abstract: the claim that the procedure is 'naturally amenable to parallelization' is asserted without reference to the specific projection step or computational complexity; a short clarifying sentence would strengthen the summary.
The application section would benefit from an explicit statement of the moment conditions used for the return-prediction exercise and how the AI-generated headlines enter the tilted weights.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary of the manuscript and the recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain consists of a Dirichlet process prior on observables, exponential tilting to enforce given moment conditions, followed by projection onto the restricted model, with new BvM and consistency theorems stated for the resulting posterior under vanishing- and persistent-prior regimes. None of the load-bearing steps reduce by construction to fitted quantities, self-definitions, or self-citation chains; the moment conditions and projection operator are taken as external inputs, and the theorems are presented as extensions of standard empirical-likelihood and Bayesian-nonparametric results without internal reduction to the paper's own fitted outputs or prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters or invented entities; the framework rests on standard properties of Dirichlet processes, empirical likelihood tilting, and moment condition models.

axioms (1)

standard math Standard properties of the Dirichlet process and projection onto moment-restricted spaces hold.
Invoked implicitly for the projection posterior construction.

pith-pipeline@v0.9.1-grok · 5700 in / 1116 out tokens · 24927 ms · 2026-06-28T20:57:08.194261+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Ackerberg, D. A., K. Caves, and G. Frazer (2015). Identification properties of recent production function estimators.Econometrica 83(6), 2411–2451

2015
[2]

Angrist, J. D. and W. N. Evans (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size.American Economic Review 88(3), 450–477

1998
[3]

Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte carlo evidence and an application to employment equations.The Review of Economic Studies 58(2), 277–297

1991
[4]

Astfalck, L., D. Sen, S. Patra, E. Cripps, and D. Dunson (2026). Posterior projection for inference in constrained spaces. arXiv:1812.05741

work page arXiv 2026
[5]

Blundell, and A

Banks, J., R. Blundell, and A. Lewbel (1997). Quadratic Engel curves and consumer demand.The Review of Economics and Statistics 79(4), 527–539

1997
[6]

Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach.The Journal of Finance 48(4), 1231–1262

1993
[7]

Levinsohn, and A

Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econometrica 63(4), 841–890

1995
[8]

Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models.Journal of Econometrics 87(1), 115–143

1998
[9]

Browning, and C

Blundell, R., M. Browning, and C. Meghir (1994). Consumer demand and the life-cycle allocation of household expenditures.The Review of Economic Studies 61(1), 57–80

1994
[10]

Chen, and D

Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric IV estimation of shape-invariant Engel curves.Econometrica 75(6), 1613–1669

2007
[11]

Bøler, E. A., A. Moxnes, and K. H. Ulltveit-Moe (2015). R&D, international sourcing, and the joint impact on firm performance.American Economic Review 105(12), 3704– 3739

2015
[12]

Shephard, and R

Bornn, L., N. Shephard, and R. Solgi (2019). Moment conditions and Bayesian non- parametrics.Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy) 81(1), 5–43

2019
[13]

Bybee, J. L. (2025). The ghost in the machine: Generating beliefs with large language models. Working paper, February 2025 version

2025
[14]

Chakraborty, M. and S. Ghosal (2022). Rates and coverage for monotone densities using projection-posterior.Bernoulli 28(2), 1093–1119

2022
[15]

Chamberlain, G. and G. W. Imbens (2003). Nonparametric applications of Bayesian inference.Journal of Business & Economic Statistics 21(1), 12–18. 33

2003
[16]

Chen, M.-H. and J. G. Ibrahim (2003). Conjugate priors for generalized linear models. Statistica Sinica 13(2), 461–476

2003
[17]

Chen, Y., B. T. Kelly, and D. Xiu (2022). Expected returns and large language models. SSRN working paper

2022
[18]

Qin, and B

Cheng, J., J. Qin, and B. Zhang (2009). Semiparametric estimation and inference for distributional and general treatment effects.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(4), 881–904

2009
[19]

Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment effects. Econometrica 73(1), 245–261

2005
[20]

Chib, S. and E. Greenberg (2010). Additive cubic spline regression with Dirichlet process mixture errors.Journal of Econometrics 156(2), 322–336

2010
[21]

Shin, and A

Chib, S., M. Shin, and A. Simoni (2018). Bayesian estimation and comparison of moment condition models.Journal of the American Statistical Association 113(524), 1656–1668

2018
[22]

Shin, and A

Chib, S., M. Shin, and A. Simoni (2022). Bayesian estimation and comparison of conditional moment models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(3), 740–764

2022
[23]

Choi, J. and S. O’Hagan (2026). Supercharging bayesian inference with reliable ai- informed priors. arXiv:2605.09834

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

Diaconis, P. and D. Ylvisaker (1979). Conjugate priors for exponential families.The Annals of Statistics 7(2), 269–281

1979
[25]

Doraszelski, U. and J. Jaumandreu (2013). R&D and productivity: Estimating en- dogenous productivity.The Review of Economic Studies 80(4), 1338–1383

2013
[26]

Efron, B. (1981). Nonparametric standard errors and confidence intervals.Canadian Journal of Statistics 9(2), 139–158

1981
[27]

Kakhbod, P

Fedyk, A., A. Kakhbod, P. Li, and U. Malmendier (2024). AI and perception biases in investments: An experimental study. SSRN working paper

2024
[28]

Ferguson, T. S. (1974). Prior distributions on spaces of probability measures.The Annals of Statistics 2(4), 615–629

1974
[29]

Lyddon, and C

Fong, E., S. Lyddon, and C. Holmes (2019). Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. InProceedings of the 36th Interna- tional Conference on Machine Learning, Volume 97 ofProceedings of Machine Learning Research, pp. 1952–1962. PMLR

2019
[30]

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50(4), 1029–1054. 34

1982
[31]

Honor´ e, B. E. and M. Weidner (2025). Moment conditions for dynamic panel logit models with fixed effects.The Review of Economic Studies 92(5), 3112–3137

2025
[32]

Stein, D

Huang, D., N. Stein, D. B. Rubin, and S. C. Kou (2020). Catalytic prior distributions with application to generalized linear models.Proceedings of the National Academy of Sciences 117(22), 12004–12010

2020
[33]

Imbens, G. W. (2002). Generalized method of moments and empirical likelihood. Journal of Business & Economic Statistics 20(4), 493–506

2002
[34]

Imbens, G. W., R. H. Spady, and P. Johnson (1998). Information-theoretic approaches to inference in moment condition models.Econometrica 66(2), 333–357

1998
[35]

Ishwaran, H. and M. Zarepour (2002). Exact and approximate sum representations for the Dirichlet process.The Canadian Journal of Statistics / La Revue Canadienne de Statistique 30(2), 269–283

2002
[36]

Kankanala, S. (2025). Generalized Bayes in conditional moment restriction models. arXiv preprint arXiv:2510.01036

work page arXiv 2025
[37]

Kim, E., S. N. MacEachern, and M. Peruggia (2026). Regularized exponentially tilted empirical likelihood for Bayesian inference. arXiv preprint arXiv:2312.17015

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Kitamura, Y. and T. Otsu (2011). Bayesian analysis of moment restriction models using nonparametric priors. Unpublished manuscript, Department of Economics, Yale University

2011
[39]

Kitamura, Y. and M. Stutzer (1997). An information-theoretic alternative to general- ized method of moments estimation.Econometrica 65(4), 861–874

1997
[40]

Koenker, R. and G. Bassett Jr (1978). Regression quantiles.Econometrica 46(1), 33–50

1978
[41]

Lazar, N. A. (2003). Bayesian empirical likelihood.Biometrika 90(2), 319–326

2003
[42]

Levinsohn, J. and A. Petrin (2003). Estimating production functions using inputs to control for unobservables.The Review of Economic Studies 70(2), 317–341

2003
[43]

Liao, Y. and W. Jiang (2011). Posterior consistency of nonparametric conditional moment restricted models.The Annals of Statistics, 3003–3031

2011
[44]

Lin, L. and D. B. Dunson (2014). Bayesian monotone regression using Gaussian process projection.Biometrika 101(2), 303–317

2014
[45]

Lopez-Lira, A. and Y. Tang (2023). Can ChatGPT forecast stock price movements? return predictability and large language models. SSRN working paper

2023
[46]

Lyddon, S. P., C. C. Holmes, and S. G. Walker (2019). General Bayesian updating and the loss-likelihood bootstrap.Biometrika 106(2), 465–478. 35

2019
[47]

Manning, B. S., K. Zhu, and J. J. Horton (2024). Automated social science: Language models as scientist and subjects. NBER Working Paper 32381, National Bureau of Economic Research

2024
[48]

Newey, W. K. and J. L. Powell (2003). Instrumental variable estimation of nonpara- metric models.Econometrica 71(5), 1565–1578

2003
[49]

Newey, W. K. and R. J. Smith (2004). Higher order properties of GMM and generalized empirical likelihood estimators.Econometrica 72(1), 219–255

2004
[50]

Newton, M. A. (1991).The Weighted Likelihood Bootstrap and an Algorithm for Prepivoting. Ph. D. thesis, Department of Statistics, University of Washington, Seattle, WA

1991
[51]

Newton, M. A. and A. E. Raftery (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society: Series B (Method- ological) 56(1), 3–26

1994
[52]

O’Hagan, S. and V. Roˇ ckov´ a (2025). AI-powered Bayesian inference. arXiv preprint arXiv:2502.19231

work page arXiv 2025
[53]

Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single func- tional.Biometrika 75(2), 237–249

1988
[54]

Owen, A. B. (2001).Empirical Likelihood. Chapman and Hall/CRC

2001
[55]

Rubin, D. B. (1981). The Bayesian bootstrap.The Annals of Statistics 9(1), 130–134

1981
[56]

Salton, G. and C. Buckley (1988). Term-weighting approaches in automatic text retrieval.Information Processing & Management 24(5), 513–523

1988
[57]

Schennach, S. M. (2005). Bayesian exponentially tilted empirical likelihood. Biometrika 92(1), 31–46

2005
[58]

Schennach, S. M. (2007). Point estimation with exponentially tilted empirical likeli- hood.The Annals of Statistics 35(2), 634–672

2007
[59]

Sethuraman, J. (1994). A constructive definition of Dirichlet priors.Statistica Sinica 4(2), 639–650

1994
[60]

Tang, R. and Y. Yang (2022). Bayesian inference for risk minimization via expo- nentially tilted empirical likelihood.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(4), 1257–1286

2022
[61]

Theobald, C. M. (1974). Generalizations of mean square error applied to ridge regres- sion.Journal of the Royal Statistical Society: Series B (Methodological) 36(1), 103–106

1974
[62]

Yiu, A., R. J. B. Goudie, and B. D. M. Tom (2020). Inference under un- equal probability sampling with the Bayesian exponentially tilted empirical likelihood. Biometrika 107(4), 857–873. 36

2020
[63]

Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. K. Goel and A. Zellner (Eds.),Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. New York: Elsevier Science Publishers. 37 SUPPLEMENTARY MATERIALS A Moment Condition Models: Motivating Examples ...

1986
[64]

The operator norm∥∇ 3 ηψ(¯η, θ)∥is uniformly bounded forη∈Λandθ∈ N
[65]

Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ)

Forθ∈ N,∥λ(θ)∥ ≤C 1∥S(θ)∥for some constantC 1. Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ). By viewingψ(η, θ) as the log cumulant generating function ofg k(θ), we can upper bound the operator norm of the third derivative tensor as ∥∇3ψ(η, θ)∥= sup ∥u∥=∥v∥=∥w∥=1 | X k pk Y t∈{u,v,w...

2025
[66]

Use only the supplied headlines and source codes
[67]

You are not asked to infer the realized future return exactly; instead, score the news tone a plausible market participant might perceive overnight
[68]

Most draws should be near zero

Many nights are mixed or weakly informative. Most draws should be near zero. Extreme values should be rare and reserved for clearly strong catalysts
[69]

Administrative, exchange, filing, promotional, or routine press-release items are usually weaker evidence than independent reported news
[70]

Analyst rating / price-target changes are moderate evidence
[71]

Strong earnings/guidance surprises, major litigation/regulatory outcomes, financing stress, M&A, management shocks, outages, or clearly material product news can justify larger |z|
[72]

Draws should vary modestly around your central judgment: - more dispersion when the evidence is mixed or ambiguous - tighter draws when the catalyst is clear
[73]

GPT-ETEL (synthetic-label)

Do not output explanations. E.3 Additional Experiment: Generating Synthetic News with Synthetic Labels In this manuscript, we primarily use generative AI in a conditional manner: given observed covariates, we prompt the model to generate synthetic labels. In principle, one could instead 58 Table 6:Test-set performance in overnight news prediction.Results ...

[1] [1]

Ackerberg, D. A., K. Caves, and G. Frazer (2015). Identification properties of recent production function estimators.Econometrica 83(6), 2411–2451

2015

[2] [2]

Angrist, J. D. and W. N. Evans (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size.American Economic Review 88(3), 450–477

1998

[3] [3]

Arellano, M. and S. Bond (1991). Some tests of specification for panel data: Monte carlo evidence and an application to employment equations.The Review of Economic Studies 58(2), 277–297

1991

[4] [4]

Astfalck, L., D. Sen, S. Patra, E. Cripps, and D. Dunson (2026). Posterior projection for inference in constrained spaces. arXiv:1812.05741

work page arXiv 2026

[5] [5]

Blundell, and A

Banks, J., R. Blundell, and A. Lewbel (1997). Quadratic Engel curves and consumer demand.The Review of Economics and Statistics 79(4), 527–539

1997

[6] [6]

Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach.The Journal of Finance 48(4), 1231–1262

1993

[7] [7]

Levinsohn, and A

Berry, S., J. Levinsohn, and A. Pakes (1995). Automobile prices in market equilibrium. Econometrica 63(4), 841–890

1995

[8] [8]

Blundell, R. and S. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models.Journal of Econometrics 87(1), 115–143

1998

[9] [9]

Browning, and C

Blundell, R., M. Browning, and C. Meghir (1994). Consumer demand and the life-cycle allocation of household expenditures.The Review of Economic Studies 61(1), 57–80

1994

[10] [10]

Chen, and D

Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric IV estimation of shape-invariant Engel curves.Econometrica 75(6), 1613–1669

2007

[11] [11]

Bøler, E. A., A. Moxnes, and K. H. Ulltveit-Moe (2015). R&D, international sourcing, and the joint impact on firm performance.American Economic Review 105(12), 3704– 3739

2015

[12] [12]

Shephard, and R

Bornn, L., N. Shephard, and R. Solgi (2019). Moment conditions and Bayesian non- parametrics.Journal of the Royal Statistical Society: Series B (Statistical Methodol- ogy) 81(1), 5–43

2019

[13] [13]

Bybee, J. L. (2025). The ghost in the machine: Generating beliefs with large language models. Working paper, February 2025 version

2025

[14] [14]

Chakraborty, M. and S. Ghosal (2022). Rates and coverage for monotone densities using projection-posterior.Bernoulli 28(2), 1093–1119

2022

[15] [15]

Chamberlain, G. and G. W. Imbens (2003). Nonparametric applications of Bayesian inference.Journal of Business & Economic Statistics 21(1), 12–18. 33

2003

[16] [16]

Chen, M.-H. and J. G. Ibrahim (2003). Conjugate priors for generalized linear models. Statistica Sinica 13(2), 461–476

2003

[17] [17]

Chen, Y., B. T. Kelly, and D. Xiu (2022). Expected returns and large language models. SSRN working paper

2022

[18] [18]

Qin, and B

Cheng, J., J. Qin, and B. Zhang (2009). Semiparametric estimation and inference for distributional and general treatment effects.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(4), 881–904

2009

[19] [19]

Chernozhukov, V. and C. Hansen (2005). An IV model of quantile treatment effects. Econometrica 73(1), 245–261

2005

[20] [20]

Chib, S. and E. Greenberg (2010). Additive cubic spline regression with Dirichlet process mixture errors.Journal of Econometrics 156(2), 322–336

2010

[21] [21]

Shin, and A

Chib, S., M. Shin, and A. Simoni (2018). Bayesian estimation and comparison of moment condition models.Journal of the American Statistical Association 113(524), 1656–1668

2018

[22] [22]

Shin, and A

Chib, S., M. Shin, and A. Simoni (2022). Bayesian estimation and comparison of conditional moment models.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(3), 740–764

2022

[23] [23]

Choi, J. and S. O’Hagan (2026). Supercharging bayesian inference with reliable ai- informed priors. arXiv:2605.09834

work page internal anchor Pith review Pith/arXiv arXiv 2026

[24] [24]

Diaconis, P. and D. Ylvisaker (1979). Conjugate priors for exponential families.The Annals of Statistics 7(2), 269–281

1979

[25] [25]

Doraszelski, U. and J. Jaumandreu (2013). R&D and productivity: Estimating en- dogenous productivity.The Review of Economic Studies 80(4), 1338–1383

2013

[26] [26]

Efron, B. (1981). Nonparametric standard errors and confidence intervals.Canadian Journal of Statistics 9(2), 139–158

1981

[27] [27]

Kakhbod, P

Fedyk, A., A. Kakhbod, P. Li, and U. Malmendier (2024). AI and perception biases in investments: An experimental study. SSRN working paper

2024

[28] [28]

Ferguson, T. S. (1974). Prior distributions on spaces of probability measures.The Annals of Statistics 2(4), 615–629

1974

[29] [29]

Lyddon, and C

Fong, E., S. Lyddon, and C. Holmes (2019). Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. InProceedings of the 36th Interna- tional Conference on Machine Learning, Volume 97 ofProceedings of Machine Learning Research, pp. 1952–1962. PMLR

2019

[30] [30]

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50(4), 1029–1054. 34

1982

[31] [31]

Honor´ e, B. E. and M. Weidner (2025). Moment conditions for dynamic panel logit models with fixed effects.The Review of Economic Studies 92(5), 3112–3137

2025

[32] [32]

Stein, D

Huang, D., N. Stein, D. B. Rubin, and S. C. Kou (2020). Catalytic prior distributions with application to generalized linear models.Proceedings of the National Academy of Sciences 117(22), 12004–12010

2020

[33] [33]

Imbens, G. W. (2002). Generalized method of moments and empirical likelihood. Journal of Business & Economic Statistics 20(4), 493–506

2002

[34] [34]

Imbens, G. W., R. H. Spady, and P. Johnson (1998). Information-theoretic approaches to inference in moment condition models.Econometrica 66(2), 333–357

1998

[35] [35]

Ishwaran, H. and M. Zarepour (2002). Exact and approximate sum representations for the Dirichlet process.The Canadian Journal of Statistics / La Revue Canadienne de Statistique 30(2), 269–283

2002

[36] [36]

Kankanala, S. (2025). Generalized Bayes in conditional moment restriction models. arXiv preprint arXiv:2510.01036

work page arXiv 2025

[37] [37]

Kim, E., S. N. MacEachern, and M. Peruggia (2026). Regularized exponentially tilted empirical likelihood for Bayesian inference. arXiv preprint arXiv:2312.17015

work page internal anchor Pith review Pith/arXiv arXiv 2026

[38] [38]

Kitamura, Y. and T. Otsu (2011). Bayesian analysis of moment restriction models using nonparametric priors. Unpublished manuscript, Department of Economics, Yale University

2011

[39] [39]

Kitamura, Y. and M. Stutzer (1997). An information-theoretic alternative to general- ized method of moments estimation.Econometrica 65(4), 861–874

1997

[40] [40]

Koenker, R. and G. Bassett Jr (1978). Regression quantiles.Econometrica 46(1), 33–50

1978

[41] [41]

Lazar, N. A. (2003). Bayesian empirical likelihood.Biometrika 90(2), 319–326

2003

[42] [42]

Levinsohn, J. and A. Petrin (2003). Estimating production functions using inputs to control for unobservables.The Review of Economic Studies 70(2), 317–341

2003

[43] [43]

Liao, Y. and W. Jiang (2011). Posterior consistency of nonparametric conditional moment restricted models.The Annals of Statistics, 3003–3031

2011

[44] [44]

Lin, L. and D. B. Dunson (2014). Bayesian monotone regression using Gaussian process projection.Biometrika 101(2), 303–317

2014

[45] [45]

Lopez-Lira, A. and Y. Tang (2023). Can ChatGPT forecast stock price movements? return predictability and large language models. SSRN working paper

2023

[46] [46]

Lyddon, S. P., C. C. Holmes, and S. G. Walker (2019). General Bayesian updating and the loss-likelihood bootstrap.Biometrika 106(2), 465–478. 35

2019

[47] [47]

Manning, B. S., K. Zhu, and J. J. Horton (2024). Automated social science: Language models as scientist and subjects. NBER Working Paper 32381, National Bureau of Economic Research

2024

[48] [48]

Newey, W. K. and J. L. Powell (2003). Instrumental variable estimation of nonpara- metric models.Econometrica 71(5), 1565–1578

2003

[49] [49]

Newey, W. K. and R. J. Smith (2004). Higher order properties of GMM and generalized empirical likelihood estimators.Econometrica 72(1), 219–255

2004

[50] [50]

Newton, M. A. (1991).The Weighted Likelihood Bootstrap and an Algorithm for Prepivoting. Ph. D. thesis, Department of Statistics, University of Washington, Seattle, WA

1991

[51] [51]

Newton, M. A. and A. E. Raftery (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society: Series B (Method- ological) 56(1), 3–26

1994

[52] [52]

O’Hagan, S. and V. Roˇ ckov´ a (2025). AI-powered Bayesian inference. arXiv preprint arXiv:2502.19231

work page arXiv 2025

[53] [53]

Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single func- tional.Biometrika 75(2), 237–249

1988

[54] [54]

Owen, A. B. (2001).Empirical Likelihood. Chapman and Hall/CRC

2001

[55] [55]

Rubin, D. B. (1981). The Bayesian bootstrap.The Annals of Statistics 9(1), 130–134

1981

[56] [56]

Salton, G. and C. Buckley (1988). Term-weighting approaches in automatic text retrieval.Information Processing & Management 24(5), 513–523

1988

[57] [57]

Schennach, S. M. (2005). Bayesian exponentially tilted empirical likelihood. Biometrika 92(1), 31–46

2005

[58] [58]

Schennach, S. M. (2007). Point estimation with exponentially tilted empirical likeli- hood.The Annals of Statistics 35(2), 634–672

2007

[59] [59]

Sethuraman, J. (1994). A constructive definition of Dirichlet priors.Statistica Sinica 4(2), 639–650

1994

[60] [60]

Tang, R. and Y. Yang (2022). Bayesian inference for risk minimization via expo- nentially tilted empirical likelihood.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 84(4), 1257–1286

2022

[61] [61]

Theobald, C. M. (1974). Generalizations of mean square error applied to ridge regres- sion.Journal of the Royal Statistical Society: Series B (Methodological) 36(1), 103–106

1974

[62] [62]

Yiu, A., R. J. B. Goudie, and B. D. M. Tom (2020). Inference under un- equal probability sampling with the Bayesian exponentially tilted empirical likelihood. Biometrika 107(4), 857–873. 36

2020

[63] [63]

Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. K. Goel and A. Zellner (Eds.),Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. New York: Elsevier Science Publishers. 37 SUPPLEMENTARY MATERIALS A Moment Condition Models: Motivating Examples ...

1986

[64] [64]

The operator norm∥∇ 3 ηψ(¯η, θ)∥is uniformly bounded forη∈Λandθ∈ N

[65] [65]

Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ)

Forθ∈ N,∥λ(θ)∥ ≤C 1∥S(θ)∥for some constantC 1. Proof.Define the tilted weightp k(η, θ) := vkeη⊤gk (θ) P j vj eη⊤gj (θ) and letµ(η, θ) :=∇ ηψ(η, θ) = P k pk(η, θ)gk(θ). By viewingψ(η, θ) as the log cumulant generating function ofg k(θ), we can upper bound the operator norm of the third derivative tensor as ∥∇3ψ(η, θ)∥= sup ∥u∥=∥v∥=∥w∥=1 | X k pk Y t∈{u,v,w...

2025

[66] [66]

Use only the supplied headlines and source codes

[67] [67]

You are not asked to infer the realized future return exactly; instead, score the news tone a plausible market participant might perceive overnight

[68] [68]

Most draws should be near zero

Many nights are mixed or weakly informative. Most draws should be near zero. Extreme values should be rare and reserved for clearly strong catalysts

[69] [69]

Administrative, exchange, filing, promotional, or routine press-release items are usually weaker evidence than independent reported news

[70] [70]

Analyst rating / price-target changes are moderate evidence

[71] [71]

Strong earnings/guidance surprises, major litigation/regulatory outcomes, financing stress, M&A, management shocks, outages, or clearly material product news can justify larger |z|

[72] [72]

Draws should vary modestly around your central judgment: - more dispersion when the evidence is mixed or ambiguous - tighter draws when the catalyst is clear

[73] [73]

GPT-ETEL (synthetic-label)

Do not output explanations. E.3 Additional Experiment: Generating Synthetic News with Synthetic Labels In this manuscript, we primarily use generative AI in a conditional manner: given observed covariates, we prompt the model to generate synthetic labels. In principle, one could instead 58 Table 6:Test-set performance in overnight news prediction.Results ...