Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials
Pith reviewed 2026-06-28 23:47 UTC · model grok-4.3
The pith
LLM predictions can be incorporated into RCT analysis to safely improve precision without bias.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM predictions can be incorporated into RCT analysis to safely improve precision, with particular value when the RCT lacks predictive covariates or contains covariates such as text data that are well-suited to LLMs.
What carries the argument
A pipeline for best leveraging LLM predictions in RCT analysis that maintains the statistical properties of the estimator.
If this is right
- Precision gains occur without biasing the treatment effect estimator.
- The largest improvements appear in RCTs that lack strong predictive covariates.
- Text-based covariates can be processed by LLMs to yield useful adjustments.
- The approach extends methods previously applied to observational data.
Where Pith is reading between the lines
- RCTs that already collect text at baseline could pre-specify LLM adjustment to increase statistical power.
- The same pipeline might work with other predictive models if they can be shown to integrate without bias.
- Routine use could reduce required sample sizes in trials where text or similar data is available.
- Further case studies in new domains would clarify the range of settings where gains are reliable.
Load-bearing premise
That LLM predictions can be integrated via the pipeline in a safe and rigorous way that avoids introducing bias or unreliability into the RCT estimator.
What would settle it
A case study or simulation in which adding the LLM predictions produces a treatment effect estimate that differs from the unadjusted estimate by more than sampling variability would predict.
read the original abstract
Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of randomized controlled trials (RCTs) in a safe and rigorous way. Following similar work on leveraging observational data, we incorporate LLM predictions into an RCT analysis. While incorporating external predictions to improve precision is not new, the value of using LLM predictions in this manner is an open question. We develop a pipeline for best leveraging LLM predictions in this context and apply it to three different case studies. We find that these predictions can safely improve precision, particularly when the RCT lacks predictive covariates or contains covariates, such as text data, that are well-suited to LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to develop a pipeline for incorporating LLM-generated predictions into RCT estimators (following methods for external predictions such as covariates or augmented IPW) to improve precision without introducing bias, and demonstrates this via three case studies, concluding that the approach is safe and effective particularly when RCTs lack strong predictive covariates or involve text data well-suited to LLMs.
Significance. If the pipeline is shown to preserve unbiasedness of the RCT estimator while delivering measurable precision gains in the case studies, the work would provide a practical extension of existing prediction-augmented RCT methods to LLMs, with potential value in settings with limited covariates or unstructured data.
major comments (2)
- [Abstract] Abstract: the central claim that LLM predictions 'can safely improve precision' rests on the pipeline avoiding bias, but no equations, identification assumptions, or estimator formulas are provided to confirm that LLM outputs are treated strictly as fixed pre-treatment functions (as required to maintain RCT validity).
- [Abstract] Abstract: the three case studies are positioned as empirical support, yet no quantitative results, error analysis, or comparison to baseline RCT estimators (e.g., precision gains or bias checks) are reported, making it impossible to evaluate whether the safety and improvement claims hold.
minor comments (1)
- [Abstract] The abstract mentions 'following similar work on leveraging observational data' but does not cite specific references for the established methods being adapted.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments. We address the two major comments on the abstract below and will revise the abstract to incorporate the requested details while preserving its conciseness.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that LLM predictions 'can safely improve precision' rests on the pipeline avoiding bias, but no equations, identification assumptions, or estimator formulas are provided to confirm that LLM outputs are treated strictly as fixed pre-treatment functions (as required to maintain RCT validity).
Authors: The manuscript develops the pipeline by extending established methods for incorporating external predictions (treated as fixed pre-treatment functions) into RCT estimators such as covariate-adjusted or augmented IPW estimators. This structure preserves the unbiasedness guaranteed by randomization. We agree the abstract would be strengthened by briefly referencing these assumptions and the estimator form; we will revise the abstract accordingly. revision: yes
-
Referee: [Abstract] Abstract: the three case studies are positioned as empirical support, yet no quantitative results, error analysis, or comparison to baseline RCT estimators (e.g., precision gains or bias checks) are reported, making it impossible to evaluate whether the safety and improvement claims hold.
Authors: The abstract summarizes the overall finding without numbers for brevity, but the full manuscript reports quantitative results from the three case studies, including precision gains relative to the unadjusted estimator and explicit checks confirming no bias is introduced. We will revise the abstract to include key quantitative highlights (e.g., reported precision improvements and bias verification) to make the empirical support more transparent. revision: yes
Circularity Check
No significant circularity
full rationale
The paper develops and applies a pipeline for incorporating LLM predictions as fixed pre-treatment functions into RCT estimators (following established external-prediction methods), then validates the approach empirically via three case studies. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or a self-definitional relation; the central claim rests on the case-study results rather than any internal re-derivation of its inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments
Aronow, Peter M. and Joel A. Middleton (May 2013). “A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments”. en. In:Journal of Causal Inference1.1, pp. 135–154.DOI: 10.1515/jci-2012-0009. URL: https://www.degruyter.com/document/doi/10.1515/jci-2012-0009/html. Blinder, Alan S. (1973). “Wage Discrimination: Reduced Form and...
-
[2]
Interpreting Effect Sizes of Education Interventions
Curran Associates, Inc.URL: https://papers. nips.cc/paper_files/paper/2018/hash/566f0ea4f6c2e947f36795c8f58ba901-Abstract.html. Kraft, Matthew A. (May 2020). “Interpreting Effect Sizes of Education Interventions”. en. In:Educational Researcher 49.4, pp. 241–253.DOI: 10.3102/0013189X20912798.URL: https://doi.org/10.3102/0013189X20912798. Kurlychek, Megan C...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.