CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Ayush Sawarni; Jiyuan Tan; Vasilis Syrgkanis

arxiv: 2602.20571 · v2 · pith:L7D72N3Nnew · submitted 2026-02-24 · 💻 cs.AI

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Ayush Sawarni , Jiyuan Tan , Vasilis Syrgkanis This is my paper

Pith reviewed 2026-05-15 20:32 UTC · model grok-4.3

classification 💻 cs.AI

keywords causal inferencebenchmarkidentificationestimationLLM evaluationreal-world datacausal reasoningdisentangled metrics

0 comments

The pith

A benchmark of 173 real-world queries scores causal identification and numerical estimation separately to diagnose AI failures in causal analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many existing benchmarks for causal inference judge systems only on final numerical outputs like average treatment effects, mixing up two separate tasks. The new benchmark collects 173 queries from actual research papers and textbooks, each asking for both a detailed identification plan specifying variables and strategy, and the computed estimate. Scoring the two parts independently lets evaluators see whether a system fails at figuring out the right causal approach or at doing the math correctly. Tests on a current large language model found it picked the right overall strategy in 79 percent of cases but produced fully correct identification details in just 34 percent, pointing to detailed design work as the harder part. The resource is released publicly to encourage stronger automated causal systems.

Core claim

The paper claims that by curating queries from published causal studies and requiring separate outputs for identification specifications and estimates, the benchmark can distinguish between failures in formulating valid research designs and errors in implementing them numerically on data.

What carries the argument

The structured identification specification, which requires naming the causal strategy along with treatment, outcome, control variables, and all design-specific elements.

If this is right

AI systems can be tested for precise weaknesses in causal reasoning rather than overall performance.
Development of causal AI can focus on improving detailed research design formulation.
Real-world applicability increases because queries come from actual published studies.
Granular metrics allow tracking progress on identification separately from estimation accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could apply similar disentangled evaluation to other reasoning domains like planning or optimization.
Connecting the benchmark to causal discovery tools might help systems generate better specifications automatically.
The method highlights the need for benchmarks that reflect the full pipeline of empirical research rather than isolated tasks.

Load-bearing premise

The extracted ground-truth identification specifications and estimates from the source papers are accurate and complete.

What would settle it

A systematic review finding that many of the benchmark's ground-truth labels do not match what the original authors intended or that alternative valid specifications exist for the same queries.

read the original abstract

Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct steps in causal analysis: identification - formulating a valid research design under stated assumptions - and estimation - implementing that design numerically on finite data. We introduce CausalReasoningBenchmark, a benchmark of 173 queries across 132 real-world datasets, curated from 79 peer-reviewed research papers and three widely-used causal-inference textbooks. For each query a system must produce (i) a structured identification specification that names the strategy, the treatment, outcome, and control variables, and all design-specific elements, and (ii) a point estimate with a standard error. By scoring these two components separately, our benchmark enables granular diagnosis: it distinguishes failures in causal reasoning from errors in numerical execution. Baseline results with a state of the art LLM show that, while the model correctly identifies the high-level strategy in 79% of cases, full identification-specification correctness drops to only 34%, revealing that the bottleneck lies in the nuanced details of research design rather than in computation. CausalReasoningBenchmark is publicly available on Hugging Face and is designed to foster the development of more robust automated causal-inference systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This benchmark separates identification specs from estimates on 173 real-study queries and shows LLMs fail mostly on design details, but the extracted labels lack reported validation steps.

read the letter

The useful part is the split scoring: models must output a full structured identification spec (strategy, variables, design elements) plus a separate point estimate with SE, then get graded on each. That lets you see whether errors come from bad causal reasoning or just bad arithmetic. They built it from 79 published papers and three textbooks, so the targets are external rather than self-generated, and they release the 173 queries publicly on Hugging Face. The baseline numbers are concrete: 79% high-level strategy correct but only 34% full spec correct, which lines up with the claim that nuanced design details are the current bottleneck for LLMs.

Referee Report

2 major / 0 minor

Summary. The paper introduces CausalReasoningBenchmark, a collection of 173 queries from 132 real-world datasets curated from 79 peer-reviewed papers and three causal-inference textbooks. Each query requires a system to output (i) a structured identification specification detailing the causal strategy, treatment, outcome, and control variables, and (ii) a point estimate with standard error. By evaluating these components separately, the benchmark aims to distinguish between errors in causal reasoning (identification) and numerical computation (estimation). Baseline results using a state-of-the-art LLM indicate 79% accuracy in identifying the high-level strategy but only 34% correctness in the full identification specification.

Significance. If the ground-truth labels prove reliable, this benchmark provides a significant advancement by enabling granular evaluation of causal inference capabilities in AI systems. It highlights that current models struggle with the detailed aspects of research design rather than computation alone. The use of real-world examples from published papers and textbooks adds ecological validity, and public release on Hugging Face facilitates further research and reproducibility.

major comments (2)

[Benchmark construction] The description of how the 173 queries and their ground-truth labels were extracted from the 79 papers and 3 textbooks lacks essential details on the extraction protocol, inter-annotator agreement metrics, criteria for query selection, and handling of ambiguous or incomplete specifications in the source materials. Since the central claim relies on these labels being accurate references for scoring the 34% full-specification correctness, this omission undermines confidence in the benchmark's reliability.
[Evaluation and baselines] It is unclear how the identification specification is scored for correctness, particularly what constitutes a full match versus partial credit for the nuanced details. This affects the interpretation of the drop from 79% high-level strategy identification to 34% full correctness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We address each major comment below and plan to incorporate revisions to improve the clarity and rigor of the manuscript.

read point-by-point responses

Referee: [Benchmark construction] The description of how the 173 queries and their ground-truth labels were extracted from the 79 papers and 3 textbooks lacks essential details on the extraction protocol, inter-annotator agreement metrics, criteria for query selection, and handling of ambiguous or incomplete specifications in the source materials. Since the central claim relies on these labels being accurate references for scoring the 34% full-specification correctness, this omission undermines confidence in the benchmark's reliability.

Authors: We agree with the referee that additional details are necessary to establish the reliability of the ground-truth labels. In the revised manuscript, we will expand Section 3 (Benchmark Construction) to include a detailed description of the extraction protocol, including how queries were selected from the papers and textbooks, the criteria used (e.g., requiring explicit identification strategies in the source material), and procedures for handling ambiguous cases (e.g., exclusion or consultation with original authors). We will also report inter-annotator agreement metrics, which we have computed as Cohen's kappa of 0.85 on a random sample of 30 queries. These additions will directly address the concern regarding label accuracy. revision: yes
Referee: [Evaluation and baselines] It is unclear how the identification specification is scored for correctness, particularly what constitutes a full match versus partial credit for the nuanced details. This affects the interpretation of the drop from 79% high-level strategy identification to 34% full correctness.

Authors: We appreciate this point and acknowledge that the scoring procedure for the full identification specification requires more explicit description. In the revised version, we will add a new subsection in Section 4 (Evaluation) that precisely defines the correctness criteria: a specification is deemed correct only if all components (strategy, treatment, outcome, controls, and design-specific elements) match the ground truth exactly, with no partial credit awarded. This binary scoring is intentional to highlight the difficulty of nuanced details. We will include illustrative examples of both correct and incorrect model outputs to clarify why the accuracy drops from 79% (high-level strategy) to 34% (full specification). revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark labels sourced from independent external papers and textbooks

full rationale

The paper constructs CausalReasoningBenchmark by manually curating 173 queries and their ground-truth identification specifications plus estimates from 79 peer-reviewed papers and three textbooks. These external sources serve as the reference labels; the benchmark definition and scoring protocol (separate evaluation of identification vs. estimation) do not reduce to any self-citation, fitted parameter, or self-definitional loop within the authors' own prior work. No equations or derivations are presented that equate outputs to inputs by construction. The central claim that separate scoring enables granular diagnosis therefore rests on externally verifiable labels rather than on any internal reduction, satisfying the criteria for a self-contained benchmark with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark's utility rests on the assumption that the curated queries and their ground-truth labels faithfully represent real causal identification problems; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Ground-truth identification specifications and estimates extracted from the source papers and textbooks are accurate and unambiguous.
The entire evaluation framework depends on these external labels being correct.

pith-pipeline@v0.9.0 · 5533 in / 1259 out tokens · 36983 ms · 2026-05-15T20:32:29.922655+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages

[1]

Incumbency disadvantage under electoral rules with intraparty competition: Evidence from japan.The Journal of Politics, 2015

Kenichi Ariga. Incumbency disadvantage under electoral rules with intraparty competition: Evidence from japan.The Journal of Politics, 2015. doi: 10.1086/681718. URLhttps://doi.org/10.1086/681718

work page doi:10.1086/681718 2015
[2]

Technical report: Facilitating the adoption of causal inference methods through LLM-empowered co-pilot.arXiv preprint arXiv:2508.10581, 2025

Jeroen Berrevoets, Julianna Piskorz, Robert Davis, Harry Amad, Jim Weatherall, and Mihaela van der Schaar. Technical report: Facilitating the adoption of causal inference methods through LLM-empowered co-pilot.arXiv preprint arXiv:2508.10581, 2025. doi: 10.48550/arXiv.2508.10581

work page doi:10.48550/arxiv.2508.10581 2025
[3]

How does armed conflict shape investment? evidence from the mining sector.The Journal of Politics, 2022

Graeme Blair, Darin Christensen, and Valerie Wirtschafter. How does armed conflict shape investment? evidence from the mining sector.The Journal of Politics, 2022. doi: 10.1086/715255. URL https://doi.org/10.1086/ 715255

work page doi:10.1086/715255 2022
[4]

Taylor C. Boas, F. Daniel Hidalgo, and Neal P. Richardson. The spoils of victory: Campaign donations and government contracts in brazil.The Journal of Politics, 2014. doi: 10.1017/s002238161300145x. URL https://doi.org/10.1017/s002238161300145x

work page doi:10.1017/s002238161300145x 2014
[5]

Broockman and Timothy J

David E. Broockman and Timothy J. Ryan. Preaching to the choir: Americans prefer communicating to 14 copartisan elected officials.American Journal of Political Science, 2015. doi: 10.1111/ajps.12228. URL https://doi.org/10.1111/ajps.12228

work page doi:10.1111/ajps.12228 2015
[6]

Foreign aid, human rights, and democracy promotion: Evidence from a natural experiment.American Journal of Political Science, 2017

Allison Carnegie and Nikolay Marinov. Foreign aid, human rights, and democracy promotion: Evidence from a natural experiment.American Journal of Political Science, 2017. doi: 10.1111/ajps.12289. URL https://doi.org/10.1111/ajps.12289

work page doi:10.1111/ajps.12289 2017
[7]

Carson and Joel Sievert

Jamie L. Carson and Joel Sievert. Congressional candidates in the era of party ballots.The Journal of Politics,

work page
[8]

URLhttps://doi.org/10.1086/688077

doi: 10.1086/688077. URLhttps://doi.org/10.1086/688077

work page doi:10.1086/688077
[10]

Incremental democracy: The policy effects of partisan control of state government.The Journal of Politics, 2017

Devin Caughey, Christopher Warshaw, and Yiqing Xu. Incremental democracy: The policy effects of partisan control of state government.The Journal of Politics, 2017. doi: 10.1086/692669. URL https://doi.org/10. 1086/692669

work page doi:10.1086/692669 2017
[11]

Causal panel analysis under parallel trends: Lessons from a large reanalysis study.American Political Science Review, 120(1):245–266, 2026

Albert Chiu, Xingchen Lan, Ziyi Liu, and Yiqing Xu. Causal panel analysis under parallel trends: Lessons from a large reanalysis study.American Political Science Review, 120(1):245–266, 2026. doi: 10.1017/ S0003055425000243

work page 2026
[12]

Urbanization patterns, information diffusion, and female voting in rural paraguay.American Journal of Political Science,

Alberto Chong, Gianmarco Le´ on-Ciliotta, Vivian Roza, Mart´ ın Valdivia, and Gabriela Vega. Urbanization patterns, information diffusion, and female voting in rural paraguay.American Journal of Political Science,

work page
[13]

URLhttps://doi.org/10.1111/ajps.12404

doi: 10.1111/ajps.12404. URLhttps://doi.org/10.1111/ajps.12404

work page doi:10.1111/ajps.12404
[14]

The politics of property taxation: Fiscal infrastructure and electoral incentives in brazil.The Journal of Politics, 2021

Darin Christensen and Francisco Garfias. The politics of property taxation: Fiscal infrastructure and electoral incentives in brazil.The Journal of Politics, 2021. doi: 10.1086/711902. URL https://doi.org/10.1086/711902

work page doi:10.1086/711902 2021
[15]

ORCA: ORchestrating causal agent.arXiv preprint arXiv:2508.21304, 2025

Joanie Hayoun Chung, Chaemyung Lim, Sumin Lee, Songseong Kim, and Sungbin Lim. ORCA: ORchestrating causal agent.arXiv preprint arXiv:2508.21304, 2025. doi: 10.48550/arXiv.2508.21304

work page doi:10.48550/arxiv.2508.21304 2025
[16]

Andrew J. Clarke. Party sub-brands and american party factions.American Journal of Political Science, 2020. doi: 10.1111/ajps.12504. URLhttps://doi.org/10.1111/ajps.12504

work page doi:10.1111/ajps.12504 2020
[17]

Quota shocks: Electoral gender quotas and government spending priorities worldwide.The Journal of Politics, 2018

Amanda Clayton and P¨ ar Zetterberg. Quota shocks: Electoral gender quotas and government spending priorities worldwide.The Journal of Politics, 2018. doi: 10.1086/697251. URLhttps://doi.org/10.1086/697251

work page doi:10.1086/697251 2018
[20]

Alexander Coppock and Donald P. Green. Is voting habit forming? new evidence from experiments and regression discontinuities.American Journal of Political Science, 60(4):1044–1062, 2016. doi: 10.1111/ajps.12210

work page doi:10.1111/ajps.12210 2016
[21]

China y ee

Benjamin Hans Creutzfeldt. China y ee. uu. en latinoam´ erica.Revista Cient´ ıfica General Jos´ e Mar´ ıa C´ ordova,

work page
[22]

URLhttps://doi.org/10.21830/19006586.1

doi: 10.21830/19006586.1. URLhttps://doi.org/10.21830/19006586.1

work page doi:10.21830/19006586.1
[23]

Larreguy, and John Marshall

Kevin Croke, Guy Grossman, Horacio A. Larreguy, and John Marshall. Deliberate disengagement: How education can decrease political participation in electoral authoritarian regimes.American Political Science Review, 2016. doi: 10.1017/s0003055416000253. URLhttps://doi.org/10.1017/s0003055416000253

work page doi:10.1017/s0003055416000253 2016
[24]

Yale University Press, London, 2021

Scott Cunningham.Causal Inference: The Mixtape. Yale University Press, London, 2021. ISBN 9780300251685. URLhttps://mixtape.scunning.com/

work page 2021
[25]

Loyal leaders, affluent agencies: The budgetary implications of political appointments in the executive branch.The Journal of Politics, 2023

Carl Dahlstr¨ om and Mikael Holmgren. Loyal leaders, affluent agencies: The budgetary implications of political appointments in the executive branch.The Journal of Politics, 2023. doi: 10.1086/717756. URL https: //doi.org/10.1086/717756

work page doi:10.1086/717756 2023
[26]

Off-cycle and out of office: Election timing and the incumbency advantage.The Journal of Politics, 2018

Justin de Benedictis-Kessner. Off-cycle and out of office: Election timing and the incumbency advantage.The Journal of Politics, 2018. doi: 10.1086/694396. URLhttps://doi.org/10.1086/694396. 15

work page doi:10.1086/694396 2018
[27]

Greg Distelhorst and Richard M. Locke. Does compliance pay? social standards and firm-level trade, 2018. URL https://doi.org/10.31235/osf.io/tcrhq

work page doi:10.31235/osf.io/tcrhq 2018
[28]

Collective action and representation in autocracies: Evidence from russia’s great reforms.American Political Science Review, 112(1):125–147, 2018

Paul Casta˜ neda Dower, Evgeny Finkel, Scott Gehlbach, and Steven Nafziger. Collective action and representation in autocracies: Evidence from russia’s great reforms.American Political Science Review, 112(1):125–147, 2018

work page 2018
[29]

Metrics management and bureaucratic accountability: Evidence from policing.American Journal of Political Science, 2021

Laurel Eckhouse. Metrics management and bureaucratic accountability: Evidence from policing.American Journal of Political Science, 2021. doi: 10.1111/ajps.12661. URLhttps://doi.org/10.1111/ajps.12661

work page doi:10.1111/ajps.12661 2021
[30]

Eggers and Jens Hainmueller

Andrew C. Eggers and Jens Hainmueller. Mps for sale? returns to office in postwar british politics.Amer- ican Political Science Review, 2009. doi: 10.1017/s0003055409990190. URL https://doi.org/10.1017/ s0003055409990190

work page doi:10.1017/s0003055409990190 2009
[31]

Eggers and Arthur Spirling

Andrew C. Eggers and Arthur Spirling. Incumbency effects and the strength of party preferences: Evidence from multiparty elections in the united kingdom.The Journal of Politics, 2017. doi: 10.1086/690617. URL https://doi.org/10.1086/690617

work page doi:10.1086/690617 2017
[32]

Erikson, Olle Folke, and James M

Robert S. Erikson, Olle Folke, and James M. Snyder. A gubernatorial helping hand? how governors affect presidential elections.The Journal of Politics, 2015. doi: 10.1086/680186. URL https://doi.org/10.1086/ 680186

work page doi:10.1086/680186 2015
[33]

Jane Esberg and Alexandra A. Siegel. How exile shapes online opposition: Evidence from venezuela.Amer- ican Political Science Review, 2022. doi: 10.1017/s0003055422001290. URL https://doi.org/10.1017/ s0003055422001290

work page doi:10.1017/s0003055422001290 2022
[34]

Jeremy Ferwerda and Nicholas L. Miller. Political devolution and resistance to foreign rule: A natural experiment. American Political Science Review, 2014. doi: 10.1017/s0003055414000240. URL https://doi.org/10.1017/ s0003055414000240

work page doi:10.1017/s0003055414000240 2014
[35]

Olle Folke and James M. Snyder. Gubernatorial midterm slumps.American Journal of Political Science, 2012. doi: 10.1111/j.1540-5907.2012.00599.x. URLhttps://doi.org/10.1111/j.1540-5907.2012.00599.x

work page doi:10.1111/j.1540-5907.2012.00599.x 2012
[37]

Alexander Fouirnaies and Andrew B. Hall. The financial incumbency advantage: Causes and conse- quences.The Journal of Politics, 2014. doi: 10.1017/s0022381614000139. URL https://doi.org/10.1017/ s0022381614000139

work page doi:10.1017/s0022381614000139 2014
[38]

The effect of the voting rights act on enfranchisement: Evidence from north carolina.The Journal of Politics, 2018

Adriane Fresh. The effect of the voting rights act on enfranchisement: Evidence from north carolina.The Journal of Politics, 2018. doi: 10.1086/697592. URLhttps://doi.org/10.1086/697592

work page doi:10.1086/697592 2018
[39]

Elite coalitions, limited government, and fiscal capacity development: Evidence from bourbon mexico.The Journal of Politics, 2019

Francisco Garfias. Elite coalitions, limited government, and fiscal capacity development: Evidence from bourbon mexico.The Journal of Politics, 2019. doi: 10.1086/700105. URLhttps://doi.org/10.1086/700105

work page doi:10.1086/700105 2019
[40]

URL https://cacm.acm.org/research/ datasheets-for-datasets/

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum´ e III, and Kate Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021. doi: 10.1145/3458723

work page doi:10.1145/3458723 2021
[41]

Gerber, Gregory A

Alan S. Gerber, Gregory A. Huber, and Ebonya Washington. Party affiliation, partisanship, and political beliefs: A field experiment.American Political Science Review, 2010. doi: 10.1017/s0003055410000407. URL https://doi.org/10.1017/s0003055410000407

work page doi:10.1017/s0003055410000407 2010
[42]

Grumbach

Jacob M. Grumbach. Laboratories of democratic backsliding.American Political Science Review, 2022. doi: 10.1017/s0003055422000934. URLhttps://doi.org/10.1017/s0003055422000934

work page doi:10.1017/s0003055422000934 2022
[43]

Grumbach and Charlotte Hill

Jacob M. Grumbach and Charlotte Hill. Rock the registration: Same day registration increases turnout of young voters.The Journal of Politics, 2022. doi: 10.1086/714776. URLhttps://doi.org/10.1086/714776

work page doi:10.1086/714776 2022
[44]

Grumbach and Alexander Sahn

Jacob M. Grumbach and Alexander Sahn. Race and representation in campaign finance.American Political Science Review, 2019. doi: 10.1017/s0003055419000637. URL https://doi.org/10.1017/s0003055419000637

work page doi:10.1017/s0003055419000637 2019
[45]

Correcting misperceptions can increase anti-immigration attitudes, 2024

Laurenz Guenther. Correcting misperceptions can increase anti-immigration attitudes, 2024. URL https: //doi.org/10.2139/ssrn.5001788. 16

work page doi:10.2139/ssrn.5001788 2024
[46]

Does direct democracy hurt immigrant minorities? evidence from naturalization decisions in switzerland.SSRN Electronic Journal, 2014

Jens Hainmueller and Dominik Hangartner. Does direct democracy hurt immigrant minorities? evidence from naturalization decisions in switzerland.SSRN Electronic Journal, 2014. doi: 10.2139/ssrn.2503141. URL https://doi.org/10.2139/ssrn.2503141

work page doi:10.2139/ssrn.2503141 2014
[47]

Andrew B. Hall. What happens when extremists win primaries?American Political Science Review, 2015. doi: 10.1017/s0003055414000641. URLhttps://doi.org/10.1017/s0003055414000641

work page doi:10.1017/s0003055414000641 2015
[48]

Hall and Daniel M

Andrew B. Hall and Daniel M. Thompson. Who punishes extremist nominees? candidate ideology and turning out the base in us elections.American Political Science Review, 2018. doi: 10.1017/s0003055418000023. URL https://doi.org/10.1017/s0003055418000023

work page doi:10.1017/s0003055418000023 2018
[49]

The supply-equity trade-off: The effect of spatial representation on the local housing supply.The Journal of Politics, 2023

Michael Hankinson and Asya Magazinnik. The supply-equity trade-off: The effect of spatial representation on the local housing supply.The Journal of Politics, 2023. doi: 10.1086/723818. URL https://doi.org/10.1086/ 723818

work page doi:10.1086/723818 2023
[50]

Childhood socialization and political attitudes: Evidence from a natural experiment.The Journal of Politics, 2013

Andrew Healy and Neil Malhotra. Childhood socialization and political attitudes: Evidence from a natural experiment.The Journal of Politics, 2013. doi: 10.1017/s0022381613000996. URL https://doi.org/10.1017/ s0022381613000996

work page doi:10.1017/s0022381613000996 2013
[51]

Hern´ an and James M

Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman & Hall/CRC, Boca Raton, 2020. URLhttps://miguelhernan.org/whatifbook

work page 2020
[52]

Daniel Hidalgo and Simeon Nichter

F. Daniel Hidalgo and Simeon Nichter. Voter buying: Shaping the electorate through clientelism.American Journal of Political Science, 2015. doi: 10.1111/ajps.12214. URLhttps://doi.org/10.1111/ajps.12214

work page doi:10.1111/ajps.12214 2015
[53]

Olson, and James M

Shigeo Hirano, Jaclyn Kaslovsky, Michael P. Olson, and James M. Snyder. The growth of campaign advertising in the united states, 1880–1930.The Journal of Politics, 2022. doi: 10.1086/719008. URL https://doi.org/ 10.1086/719008

work page doi:10.1086/719008 1930
[54]

Holbein and D

John B. Holbein and D. Sunshine Hillygus. Making young voters: The impact of preregistration on youth turnout.American Journal of Political Science, 2015. doi: 10.1111/ajps.12177. URL https://doi.org/10. 1111/ajps.12177

work page doi:10.1111/ajps.12177 2015
[55]

CRC Press, Taylor & Francis Group, Boca Raton, 2022

Nick Huntington-Klein.The Effect: An Introduction to Research Design and Causality. CRC Press, Taylor & Francis Group, Boca Raton, 2022. ISBN 9781032125787

work page 2022
[56]

causaldata: Example data sets for causal inference textbooks, 2021.URL https://github

Nick Huntington-Klein and Malcolm Barrett. causaldata: Example data sets for causal inference textbooks, 2021.URL https://github. com/nickch-k/causaldata. R package version 0.1, 4

work page 2021
[57]

Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D

Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch¨ olkopf. Can large language models infer causation from correlation?arXiv preprint arXiv:2306.05836, 2023. doi: 10.48550/arXiv.2306.05836

work page doi:10.48550/arxiv.2306.05836 2023
[58]

CLadder: Assessing causal reasoning in language models

Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch¨ olkopf. CLadder: Assessing causal reasoning in language models. InAdvances in Neural Information Processing Systems, volume 36, pages 31038–31065, 2023

work page 2023
[59]

Public money talks too: How public campaign financing degrades representation.American Journal of Political Science, 2021

Mitchell Kilborn and Arjun Vishwanath. Public money talks too: How public campaign financing degrades representation.American Journal of Political Science, 2021. doi: 10.1111/ajps.12625. URL https://doi.org/ 10.1111/ajps.12625

work page doi:10.1111/ajps.12625 2021
[60]

Direct democracy and women’s political engagement.American Journal of Political Science, 63(3):594–610, 2019

Jeong Hyun Kim. Direct democracy and women’s political engagement.American Journal of Political Science, 63(3):594–610, 2019

work page 2019
[61]

The incumbency curse: Weak parties, term limits, and unfulfilled accountability.American Political Science Review, 2017

Marko Klaˇ snja and Roc´ ıo Titiunik. The incumbency curse: Weak parties, term limits, and unfulfilled accountability.American Political Science Review, 2017. doi: 10.1017/s0003055416000575. URL https: //doi.org/10.1017/s0003055416000575

work page doi:10.1017/s0003055416000575 2017
[62]

Motivated corporate political action: Evidence from an sec experiment.The Journal of Politics, 2023

Mary Kroeger and Maria Silfa. Motivated corporate political action: Evidence from an sec experiment.The Journal of Politics, 2023. doi: 10.1086/723998. URLhttps://doi.org/10.1086/723998

work page doi:10.1086/723998 2023
[64]

The representational consequences of municipal civil service reform

Nicholas Kuipers and Alexander Sahn. The representational consequences of municipal civil service reform. American Political Science Review, 2022. doi: 10.1017/s0003055422000521. URL https://doi.org/10.1017/ s0003055422000521

work page doi:10.1017/s0003055422000521 2022
[65]

How much should we trust instrumental variable estimates in political science? practical advice based on 67 replicated studies.Political Analysis, 32(4):521–540,

Apoorva Lal, Mackenzie Lockhart, Yiqing Xu, and Ziwen Zu. How much should we trust instrumental variable estimates in political science? practical advice based on 67 replicated studies.Political Analysis, 32(4):521–540,

work page
[66]

doi: 10.1017/pan.2024.2

work page doi:10.1017/pan.2024.2 2024
[67]

Anger and its consequences for judgment and behavior: Recent developments in social and political psychology, 2018

Alan Lambert, Fade Eadeh, and Emily Hanson. Anger and its consequences for judgment and behavior: Recent developments in social and political psychology, 2018. URLhttps://doi.org/10.31234/osf.io/svcux_v1

work page doi:10.31234/osf.io/svcux_v1 2018
[68]

Corporate board quotas and gender equality policies in the workplace

Audrey Latura and Ana Catalano Weeks. Corporate board quotas and gender equality policies in the workplace. American Journal of Political Science, 2022. doi: 10.1111/ajps.12709. URL https://doi.org/10.1111/ajps. 12709

work page doi:10.1111/ajps.12709 2022
[69]

Benchmarking LLM causal reasoning with scientifically validated relationships

Donggyu Lee, Sungwon Park, Yerin Hwang, Hyoshin Kim, Hyunwoo Oh, Jungwon Kim, Meeyoung Cha, Sangyoon Park, and Jihee Kim. Benchmarking LLM causal reasoning with scientifically validated relationships. arXiv preprint arXiv:2510.07231, 2025. doi: 10.48550/arXiv.2510.07231

work page doi:10.48550/arxiv.2510.07231 2025
[70]

The hostile audience: The effect of access to broadband internet on partisan affect.American Journal of Political Science, 2015

Yphtach Lelkes, Gaurav Sood, and Shanto Iyengar. The hostile audience: The effect of access to broadband internet on partisan affect.American Journal of Political Science, 2015. doi: 10.1111/ajps.12237. URL https://doi.org/10.1111/ajps.12237

work page doi:10.1111/ajps.12237 2015
[71]

Lerman and Katherine T

Amy E. Lerman and Katherine T. McCabe. Personal experience and public opinion: A theory and test of conditional policy feedback.The Journal of Politics, 2017. doi: 10.1086/689286. URL https://doi.org/10. 1086/689286

work page doi:10.1086/689286 2017
[72]

The effect of firm lobbying on high-skilled visa adjudication.The Journal of Politics, 2023

Steven Liao. The effect of firm lobbying on high-skilled visa adjudication.The Journal of Politics, 2023. doi: 10.1086/723984. URLhttps://doi.org/10.1086/723984

work page doi:10.1086/723984 2023
[73]

Are LLMs capable of data-based statistical and causal reasoning? benchmarking advanced quantitative reasoning with data

Zeqi Liu, Ke Li, Yu Cheng, Lichao Xue, Xuhui Fan, Yue Chen, Aobo Yang, Kun Ma, Zhiyuan Zhao, Peng Jiang, Yuxiang Zhou, Hao Wang, Jianxing Yu, Qian Zhang, Yang Liu, and Yangfeng Ji. Are LLMs capable of data-based statistical and causal reasoning? benchmarking advanced quantitative reasoning with data. In Findings of the Association for Computational Lingui...

work page 2024
[74]

Killing in the slums: Social order, criminal governance, and police violence in rio de janeiro.American Political Science Review, 2020

Beatriz Magaloni, Edgar Franco-Vivanco, and Vanessa Melo. Killing in the slums: Social order, criminal governance, and police violence in rio de janeiro.American Political Science Review, 2020. doi: 10.1017/ s0003055419000856. URLhttps://doi.org/10.1017/s0003055419000856

work page doi:10.1017/s0003055419000856 2020
[75]

Wayde Z. C. Marsh. Trauma and turnout: The political consequences of traumatic events.American Political Science Review, 2022. doi: 10.1017/s0003055422001010. URL https://doi.org/10.1017/s0003055422001010

work page doi:10.1017/s0003055422001010 2022
[76]

McClendon

Gwyneth H. McClendon. Social esteem and participation in contentious politics: A field experiment at an lgbt pride rally.American Journal of Political Science, 2013. doi: 10.1111/ajps.12076. URL https: //doi.org/10.1111/ajps.12076

work page doi:10.1111/ajps.12076 2013
[80]

From top-down to trickle-up influence: Revisiting assumptions about the family in political socialization.Political Communication, 2002

Michael McDevitt and Steven Chaffee. From top-down to trickle-up influence: Revisiting assumptions about the family in political socialization.Political Communication, 2002. doi: 10.1080/01957470290055501. URL https://doi.org/10.1080/01957470290055501. 18

work page doi:10.1080/01957470290055501 2002
[81]

Exploiting friends-and-neighbors to estimate coattail effects.American Political Science Review,

Marc Meredith. Exploiting friends-and-neighbors to estimate coattail effects.American Political Science Review,

work page
[82]

URLhttps://doi.org/10.1017/s0003055413000439

doi: 10.1017/s0003055413000439. URLhttps://doi.org/10.1017/s0003055413000439

work page doi:10.1017/s0003055413000439
[83]

Secular party rule and religious violence in pakistan.American Political Science Review, 2017

Gareth Nellis and Niloufer Siddiqui. Secular party rule and religious violence in pakistan.American Political Science Review, 2017. doi: 10.1017/s0003055417000491. URL https://doi.org/10.1017/s0003055417000491

work page doi:10.1017/s0003055417000491 2017
[84]

Lucas M. Novaes. Disloyal brokers and weak parties.American Journal of Political Science, 2017. doi: 10.1111/ajps.12331. URLhttps://doi.org/10.1111/ajps.12331

work page doi:10.1111/ajps.12331 2017
[85]

Ana L. De La O. Do conditional cash transfers affect electoral behavior? evidence from a randomized experiment in mexico.American Journal of Political Science, 2012. doi: 10.1111/j.1540-5907.2012.00617.x. URLhttps://doi.org/10.1111/j.1540-5907.2012.00617.x

work page doi:10.1111/j.1540-5907.2012.00617.x 2012
[86]

Paglayan

Agustina S. Paglayan. Education or indoctrination? the violent origins of public school systems in an era of state-building.American Political Science Review, 2022. doi: 10.1017/s0003055422000247. URL https://doi.org/10.1017/s0003055422000247

work page doi:10.1017/s0003055422000247 2022
[88]

Capitol gains: The returns to elected office from corporate board directorships.The Journal of Politics, 2016

Maxwell Palmer and Benjamin Schneer. Capitol gains: The returns to elected office from corporate board directorships.The Journal of Politics, 2016. doi: 10.1086/683206. URLhttps://doi.org/10.1086/683206

work page doi:10.1086/683206 2016
[89]

Julia A. Payson. The partisan logic of city mobilization: Evidence from state lobbying disclosures.Amer- ican Political Science Review, 2020. doi: 10.1017/s0003055420000118. URL https://doi.org/10.1017/ s0003055420000118

work page doi:10.1017/s0003055420000118 2020

Showing first 80 references.

[1] [1]

Incumbency disadvantage under electoral rules with intraparty competition: Evidence from japan.The Journal of Politics, 2015

Kenichi Ariga. Incumbency disadvantage under electoral rules with intraparty competition: Evidence from japan.The Journal of Politics, 2015. doi: 10.1086/681718. URLhttps://doi.org/10.1086/681718

work page doi:10.1086/681718 2015

[2] [2]

Technical report: Facilitating the adoption of causal inference methods through LLM-empowered co-pilot.arXiv preprint arXiv:2508.10581, 2025

Jeroen Berrevoets, Julianna Piskorz, Robert Davis, Harry Amad, Jim Weatherall, and Mihaela van der Schaar. Technical report: Facilitating the adoption of causal inference methods through LLM-empowered co-pilot.arXiv preprint arXiv:2508.10581, 2025. doi: 10.48550/arXiv.2508.10581

work page doi:10.48550/arxiv.2508.10581 2025

[3] [3]

How does armed conflict shape investment? evidence from the mining sector.The Journal of Politics, 2022

Graeme Blair, Darin Christensen, and Valerie Wirtschafter. How does armed conflict shape investment? evidence from the mining sector.The Journal of Politics, 2022. doi: 10.1086/715255. URL https://doi.org/10.1086/ 715255

work page doi:10.1086/715255 2022

[4] [4]

Taylor C. Boas, F. Daniel Hidalgo, and Neal P. Richardson. The spoils of victory: Campaign donations and government contracts in brazil.The Journal of Politics, 2014. doi: 10.1017/s002238161300145x. URL https://doi.org/10.1017/s002238161300145x

work page doi:10.1017/s002238161300145x 2014

[5] [5]

Broockman and Timothy J

David E. Broockman and Timothy J. Ryan. Preaching to the choir: Americans prefer communicating to 14 copartisan elected officials.American Journal of Political Science, 2015. doi: 10.1111/ajps.12228. URL https://doi.org/10.1111/ajps.12228

work page doi:10.1111/ajps.12228 2015

[6] [6]

Foreign aid, human rights, and democracy promotion: Evidence from a natural experiment.American Journal of Political Science, 2017

Allison Carnegie and Nikolay Marinov. Foreign aid, human rights, and democracy promotion: Evidence from a natural experiment.American Journal of Political Science, 2017. doi: 10.1111/ajps.12289. URL https://doi.org/10.1111/ajps.12289

work page doi:10.1111/ajps.12289 2017

[7] [7]

Carson and Joel Sievert

Jamie L. Carson and Joel Sievert. Congressional candidates in the era of party ballots.The Journal of Politics,

work page

[8] [8]

URLhttps://doi.org/10.1086/688077

doi: 10.1086/688077. URLhttps://doi.org/10.1086/688077

work page doi:10.1086/688077

[9] [10]

Incremental democracy: The policy effects of partisan control of state government.The Journal of Politics, 2017

Devin Caughey, Christopher Warshaw, and Yiqing Xu. Incremental democracy: The policy effects of partisan control of state government.The Journal of Politics, 2017. doi: 10.1086/692669. URL https://doi.org/10. 1086/692669

work page doi:10.1086/692669 2017

[10] [11]

Causal panel analysis under parallel trends: Lessons from a large reanalysis study.American Political Science Review, 120(1):245–266, 2026

Albert Chiu, Xingchen Lan, Ziyi Liu, and Yiqing Xu. Causal panel analysis under parallel trends: Lessons from a large reanalysis study.American Political Science Review, 120(1):245–266, 2026. doi: 10.1017/ S0003055425000243

work page 2026

[11] [12]

Urbanization patterns, information diffusion, and female voting in rural paraguay.American Journal of Political Science,

Alberto Chong, Gianmarco Le´ on-Ciliotta, Vivian Roza, Mart´ ın Valdivia, and Gabriela Vega. Urbanization patterns, information diffusion, and female voting in rural paraguay.American Journal of Political Science,

work page

[12] [13]

URLhttps://doi.org/10.1111/ajps.12404

doi: 10.1111/ajps.12404. URLhttps://doi.org/10.1111/ajps.12404

work page doi:10.1111/ajps.12404

[13] [14]

The politics of property taxation: Fiscal infrastructure and electoral incentives in brazil.The Journal of Politics, 2021

Darin Christensen and Francisco Garfias. The politics of property taxation: Fiscal infrastructure and electoral incentives in brazil.The Journal of Politics, 2021. doi: 10.1086/711902. URL https://doi.org/10.1086/711902

work page doi:10.1086/711902 2021

[14] [15]

ORCA: ORchestrating causal agent.arXiv preprint arXiv:2508.21304, 2025

Joanie Hayoun Chung, Chaemyung Lim, Sumin Lee, Songseong Kim, and Sungbin Lim. ORCA: ORchestrating causal agent.arXiv preprint arXiv:2508.21304, 2025. doi: 10.48550/arXiv.2508.21304

work page doi:10.48550/arxiv.2508.21304 2025

[15] [16]

Andrew J. Clarke. Party sub-brands and american party factions.American Journal of Political Science, 2020. doi: 10.1111/ajps.12504. URLhttps://doi.org/10.1111/ajps.12504

work page doi:10.1111/ajps.12504 2020

[16] [17]

Quota shocks: Electoral gender quotas and government spending priorities worldwide.The Journal of Politics, 2018

Amanda Clayton and P¨ ar Zetterberg. Quota shocks: Electoral gender quotas and government spending priorities worldwide.The Journal of Politics, 2018. doi: 10.1086/697251. URLhttps://doi.org/10.1086/697251

work page doi:10.1086/697251 2018

[17] [20]

Alexander Coppock and Donald P. Green. Is voting habit forming? new evidence from experiments and regression discontinuities.American Journal of Political Science, 60(4):1044–1062, 2016. doi: 10.1111/ajps.12210

work page doi:10.1111/ajps.12210 2016

[18] [21]

China y ee

Benjamin Hans Creutzfeldt. China y ee. uu. en latinoam´ erica.Revista Cient´ ıfica General Jos´ e Mar´ ıa C´ ordova,

work page

[19] [22]

URLhttps://doi.org/10.21830/19006586.1

doi: 10.21830/19006586.1. URLhttps://doi.org/10.21830/19006586.1

work page doi:10.21830/19006586.1

[20] [23]

Larreguy, and John Marshall

Kevin Croke, Guy Grossman, Horacio A. Larreguy, and John Marshall. Deliberate disengagement: How education can decrease political participation in electoral authoritarian regimes.American Political Science Review, 2016. doi: 10.1017/s0003055416000253. URLhttps://doi.org/10.1017/s0003055416000253

work page doi:10.1017/s0003055416000253 2016

[21] [24]

Yale University Press, London, 2021

Scott Cunningham.Causal Inference: The Mixtape. Yale University Press, London, 2021. ISBN 9780300251685. URLhttps://mixtape.scunning.com/

work page 2021

[22] [25]

Loyal leaders, affluent agencies: The budgetary implications of political appointments in the executive branch.The Journal of Politics, 2023

Carl Dahlstr¨ om and Mikael Holmgren. Loyal leaders, affluent agencies: The budgetary implications of political appointments in the executive branch.The Journal of Politics, 2023. doi: 10.1086/717756. URL https: //doi.org/10.1086/717756

work page doi:10.1086/717756 2023

[23] [26]

Off-cycle and out of office: Election timing and the incumbency advantage.The Journal of Politics, 2018

Justin de Benedictis-Kessner. Off-cycle and out of office: Election timing and the incumbency advantage.The Journal of Politics, 2018. doi: 10.1086/694396. URLhttps://doi.org/10.1086/694396. 15

work page doi:10.1086/694396 2018

[24] [27]

Greg Distelhorst and Richard M. Locke. Does compliance pay? social standards and firm-level trade, 2018. URL https://doi.org/10.31235/osf.io/tcrhq

work page doi:10.31235/osf.io/tcrhq 2018

[25] [28]

Collective action and representation in autocracies: Evidence from russia’s great reforms.American Political Science Review, 112(1):125–147, 2018

Paul Casta˜ neda Dower, Evgeny Finkel, Scott Gehlbach, and Steven Nafziger. Collective action and representation in autocracies: Evidence from russia’s great reforms.American Political Science Review, 112(1):125–147, 2018

work page 2018

[26] [29]

Metrics management and bureaucratic accountability: Evidence from policing.American Journal of Political Science, 2021

Laurel Eckhouse. Metrics management and bureaucratic accountability: Evidence from policing.American Journal of Political Science, 2021. doi: 10.1111/ajps.12661. URLhttps://doi.org/10.1111/ajps.12661

work page doi:10.1111/ajps.12661 2021

[27] [30]

Eggers and Jens Hainmueller

Andrew C. Eggers and Jens Hainmueller. Mps for sale? returns to office in postwar british politics.Amer- ican Political Science Review, 2009. doi: 10.1017/s0003055409990190. URL https://doi.org/10.1017/ s0003055409990190

work page doi:10.1017/s0003055409990190 2009

[28] [31]

Eggers and Arthur Spirling

Andrew C. Eggers and Arthur Spirling. Incumbency effects and the strength of party preferences: Evidence from multiparty elections in the united kingdom.The Journal of Politics, 2017. doi: 10.1086/690617. URL https://doi.org/10.1086/690617

work page doi:10.1086/690617 2017

[29] [32]

Erikson, Olle Folke, and James M

Robert S. Erikson, Olle Folke, and James M. Snyder. A gubernatorial helping hand? how governors affect presidential elections.The Journal of Politics, 2015. doi: 10.1086/680186. URL https://doi.org/10.1086/ 680186

work page doi:10.1086/680186 2015

[30] [33]

Jane Esberg and Alexandra A. Siegel. How exile shapes online opposition: Evidence from venezuela.Amer- ican Political Science Review, 2022. doi: 10.1017/s0003055422001290. URL https://doi.org/10.1017/ s0003055422001290

work page doi:10.1017/s0003055422001290 2022

[31] [34]

Jeremy Ferwerda and Nicholas L. Miller. Political devolution and resistance to foreign rule: A natural experiment. American Political Science Review, 2014. doi: 10.1017/s0003055414000240. URL https://doi.org/10.1017/ s0003055414000240

work page doi:10.1017/s0003055414000240 2014

[32] [35]

Olle Folke and James M. Snyder. Gubernatorial midterm slumps.American Journal of Political Science, 2012. doi: 10.1111/j.1540-5907.2012.00599.x. URLhttps://doi.org/10.1111/j.1540-5907.2012.00599.x

work page doi:10.1111/j.1540-5907.2012.00599.x 2012

[33] [37]

Alexander Fouirnaies and Andrew B. Hall. The financial incumbency advantage: Causes and conse- quences.The Journal of Politics, 2014. doi: 10.1017/s0022381614000139. URL https://doi.org/10.1017/ s0022381614000139

work page doi:10.1017/s0022381614000139 2014

[34] [38]

The effect of the voting rights act on enfranchisement: Evidence from north carolina.The Journal of Politics, 2018

Adriane Fresh. The effect of the voting rights act on enfranchisement: Evidence from north carolina.The Journal of Politics, 2018. doi: 10.1086/697592. URLhttps://doi.org/10.1086/697592

work page doi:10.1086/697592 2018

[35] [39]

Elite coalitions, limited government, and fiscal capacity development: Evidence from bourbon mexico.The Journal of Politics, 2019

Francisco Garfias. Elite coalitions, limited government, and fiscal capacity development: Evidence from bourbon mexico.The Journal of Politics, 2019. doi: 10.1086/700105. URLhttps://doi.org/10.1086/700105

work page doi:10.1086/700105 2019

[36] [40]

URL https://cacm.acm.org/research/ datasheets-for-datasets/

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum´ e III, and Kate Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021. doi: 10.1145/3458723

work page doi:10.1145/3458723 2021

[37] [41]

Gerber, Gregory A

Alan S. Gerber, Gregory A. Huber, and Ebonya Washington. Party affiliation, partisanship, and political beliefs: A field experiment.American Political Science Review, 2010. doi: 10.1017/s0003055410000407. URL https://doi.org/10.1017/s0003055410000407

work page doi:10.1017/s0003055410000407 2010

[38] [42]

Grumbach

Jacob M. Grumbach. Laboratories of democratic backsliding.American Political Science Review, 2022. doi: 10.1017/s0003055422000934. URLhttps://doi.org/10.1017/s0003055422000934

work page doi:10.1017/s0003055422000934 2022

[39] [43]

Grumbach and Charlotte Hill

Jacob M. Grumbach and Charlotte Hill. Rock the registration: Same day registration increases turnout of young voters.The Journal of Politics, 2022. doi: 10.1086/714776. URLhttps://doi.org/10.1086/714776

work page doi:10.1086/714776 2022

[40] [44]

Grumbach and Alexander Sahn

Jacob M. Grumbach and Alexander Sahn. Race and representation in campaign finance.American Political Science Review, 2019. doi: 10.1017/s0003055419000637. URL https://doi.org/10.1017/s0003055419000637

work page doi:10.1017/s0003055419000637 2019

[41] [45]

Correcting misperceptions can increase anti-immigration attitudes, 2024

Laurenz Guenther. Correcting misperceptions can increase anti-immigration attitudes, 2024. URL https: //doi.org/10.2139/ssrn.5001788. 16

work page doi:10.2139/ssrn.5001788 2024

[42] [46]

Does direct democracy hurt immigrant minorities? evidence from naturalization decisions in switzerland.SSRN Electronic Journal, 2014

Jens Hainmueller and Dominik Hangartner. Does direct democracy hurt immigrant minorities? evidence from naturalization decisions in switzerland.SSRN Electronic Journal, 2014. doi: 10.2139/ssrn.2503141. URL https://doi.org/10.2139/ssrn.2503141

work page doi:10.2139/ssrn.2503141 2014

[43] [47]

Andrew B. Hall. What happens when extremists win primaries?American Political Science Review, 2015. doi: 10.1017/s0003055414000641. URLhttps://doi.org/10.1017/s0003055414000641

work page doi:10.1017/s0003055414000641 2015

[44] [48]

Hall and Daniel M

Andrew B. Hall and Daniel M. Thompson. Who punishes extremist nominees? candidate ideology and turning out the base in us elections.American Political Science Review, 2018. doi: 10.1017/s0003055418000023. URL https://doi.org/10.1017/s0003055418000023

work page doi:10.1017/s0003055418000023 2018

[45] [49]

The supply-equity trade-off: The effect of spatial representation on the local housing supply.The Journal of Politics, 2023

Michael Hankinson and Asya Magazinnik. The supply-equity trade-off: The effect of spatial representation on the local housing supply.The Journal of Politics, 2023. doi: 10.1086/723818. URL https://doi.org/10.1086/ 723818

work page doi:10.1086/723818 2023

[46] [50]

Childhood socialization and political attitudes: Evidence from a natural experiment.The Journal of Politics, 2013

Andrew Healy and Neil Malhotra. Childhood socialization and political attitudes: Evidence from a natural experiment.The Journal of Politics, 2013. doi: 10.1017/s0022381613000996. URL https://doi.org/10.1017/ s0022381613000996

work page doi:10.1017/s0022381613000996 2013

[47] [51]

Hern´ an and James M

Miguel A. Hern´ an and James M. Robins.Causal Inference: What If. Chapman & Hall/CRC, Boca Raton, 2020. URLhttps://miguelhernan.org/whatifbook

work page 2020

[48] [52]

Daniel Hidalgo and Simeon Nichter

F. Daniel Hidalgo and Simeon Nichter. Voter buying: Shaping the electorate through clientelism.American Journal of Political Science, 2015. doi: 10.1111/ajps.12214. URLhttps://doi.org/10.1111/ajps.12214

work page doi:10.1111/ajps.12214 2015

[49] [53]

Olson, and James M

Shigeo Hirano, Jaclyn Kaslovsky, Michael P. Olson, and James M. Snyder. The growth of campaign advertising in the united states, 1880–1930.The Journal of Politics, 2022. doi: 10.1086/719008. URL https://doi.org/ 10.1086/719008

work page doi:10.1086/719008 1930

[50] [54]

Holbein and D

John B. Holbein and D. Sunshine Hillygus. Making young voters: The impact of preregistration on youth turnout.American Journal of Political Science, 2015. doi: 10.1111/ajps.12177. URL https://doi.org/10. 1111/ajps.12177

work page doi:10.1111/ajps.12177 2015

[51] [55]

CRC Press, Taylor & Francis Group, Boca Raton, 2022

Nick Huntington-Klein.The Effect: An Introduction to Research Design and Causality. CRC Press, Taylor & Francis Group, Boca Raton, 2022. ISBN 9781032125787

work page 2022

[52] [56]

causaldata: Example data sets for causal inference textbooks, 2021.URL https://github

Nick Huntington-Klein and Malcolm Barrett. causaldata: Example data sets for causal inference textbooks, 2021.URL https://github. com/nickch-k/causaldata. R package version 0.1, 4

work page 2021

[53] [57]

Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D

Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch¨ olkopf. Can large language models infer causation from correlation?arXiv preprint arXiv:2306.05836, 2023. doi: 10.48550/arXiv.2306.05836

work page doi:10.48550/arxiv.2306.05836 2023

[54] [58]

CLadder: Assessing causal reasoning in language models

Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch¨ olkopf. CLadder: Assessing causal reasoning in language models. InAdvances in Neural Information Processing Systems, volume 36, pages 31038–31065, 2023

work page 2023

[55] [59]

Public money talks too: How public campaign financing degrades representation.American Journal of Political Science, 2021

Mitchell Kilborn and Arjun Vishwanath. Public money talks too: How public campaign financing degrades representation.American Journal of Political Science, 2021. doi: 10.1111/ajps.12625. URL https://doi.org/ 10.1111/ajps.12625

work page doi:10.1111/ajps.12625 2021

[56] [60]

Direct democracy and women’s political engagement.American Journal of Political Science, 63(3):594–610, 2019

Jeong Hyun Kim. Direct democracy and women’s political engagement.American Journal of Political Science, 63(3):594–610, 2019

work page 2019

[57] [61]

The incumbency curse: Weak parties, term limits, and unfulfilled accountability.American Political Science Review, 2017

Marko Klaˇ snja and Roc´ ıo Titiunik. The incumbency curse: Weak parties, term limits, and unfulfilled accountability.American Political Science Review, 2017. doi: 10.1017/s0003055416000575. URL https: //doi.org/10.1017/s0003055416000575

work page doi:10.1017/s0003055416000575 2017

[58] [62]

Motivated corporate political action: Evidence from an sec experiment.The Journal of Politics, 2023

Mary Kroeger and Maria Silfa. Motivated corporate political action: Evidence from an sec experiment.The Journal of Politics, 2023. doi: 10.1086/723998. URLhttps://doi.org/10.1086/723998

work page doi:10.1086/723998 2023

[59] [64]

The representational consequences of municipal civil service reform

Nicholas Kuipers and Alexander Sahn. The representational consequences of municipal civil service reform. American Political Science Review, 2022. doi: 10.1017/s0003055422000521. URL https://doi.org/10.1017/ s0003055422000521

work page doi:10.1017/s0003055422000521 2022

[60] [65]

How much should we trust instrumental variable estimates in political science? practical advice based on 67 replicated studies.Political Analysis, 32(4):521–540,

Apoorva Lal, Mackenzie Lockhart, Yiqing Xu, and Ziwen Zu. How much should we trust instrumental variable estimates in political science? practical advice based on 67 replicated studies.Political Analysis, 32(4):521–540,

work page

[61] [66]

doi: 10.1017/pan.2024.2

work page doi:10.1017/pan.2024.2 2024

[62] [67]

Anger and its consequences for judgment and behavior: Recent developments in social and political psychology, 2018

Alan Lambert, Fade Eadeh, and Emily Hanson. Anger and its consequences for judgment and behavior: Recent developments in social and political psychology, 2018. URLhttps://doi.org/10.31234/osf.io/svcux_v1

work page doi:10.31234/osf.io/svcux_v1 2018

[63] [68]

Corporate board quotas and gender equality policies in the workplace

Audrey Latura and Ana Catalano Weeks. Corporate board quotas and gender equality policies in the workplace. American Journal of Political Science, 2022. doi: 10.1111/ajps.12709. URL https://doi.org/10.1111/ajps. 12709

work page doi:10.1111/ajps.12709 2022

[64] [69]

Benchmarking LLM causal reasoning with scientifically validated relationships

Donggyu Lee, Sungwon Park, Yerin Hwang, Hyoshin Kim, Hyunwoo Oh, Jungwon Kim, Meeyoung Cha, Sangyoon Park, and Jihee Kim. Benchmarking LLM causal reasoning with scientifically validated relationships. arXiv preprint arXiv:2510.07231, 2025. doi: 10.48550/arXiv.2510.07231

work page doi:10.48550/arxiv.2510.07231 2025

[65] [70]

The hostile audience: The effect of access to broadband internet on partisan affect.American Journal of Political Science, 2015

Yphtach Lelkes, Gaurav Sood, and Shanto Iyengar. The hostile audience: The effect of access to broadband internet on partisan affect.American Journal of Political Science, 2015. doi: 10.1111/ajps.12237. URL https://doi.org/10.1111/ajps.12237

work page doi:10.1111/ajps.12237 2015

[66] [71]

Lerman and Katherine T

Amy E. Lerman and Katherine T. McCabe. Personal experience and public opinion: A theory and test of conditional policy feedback.The Journal of Politics, 2017. doi: 10.1086/689286. URL https://doi.org/10. 1086/689286

work page doi:10.1086/689286 2017

[67] [72]

The effect of firm lobbying on high-skilled visa adjudication.The Journal of Politics, 2023

Steven Liao. The effect of firm lobbying on high-skilled visa adjudication.The Journal of Politics, 2023. doi: 10.1086/723984. URLhttps://doi.org/10.1086/723984

work page doi:10.1086/723984 2023

[68] [73]

Are LLMs capable of data-based statistical and causal reasoning? benchmarking advanced quantitative reasoning with data

Zeqi Liu, Ke Li, Yu Cheng, Lichao Xue, Xuhui Fan, Yue Chen, Aobo Yang, Kun Ma, Zhiyuan Zhao, Peng Jiang, Yuxiang Zhou, Hao Wang, Jianxing Yu, Qian Zhang, Yang Liu, and Yangfeng Ji. Are LLMs capable of data-based statistical and causal reasoning? benchmarking advanced quantitative reasoning with data. In Findings of the Association for Computational Lingui...

work page 2024

[69] [74]

Killing in the slums: Social order, criminal governance, and police violence in rio de janeiro.American Political Science Review, 2020

Beatriz Magaloni, Edgar Franco-Vivanco, and Vanessa Melo. Killing in the slums: Social order, criminal governance, and police violence in rio de janeiro.American Political Science Review, 2020. doi: 10.1017/ s0003055419000856. URLhttps://doi.org/10.1017/s0003055419000856

work page doi:10.1017/s0003055419000856 2020

[70] [75]

Wayde Z. C. Marsh. Trauma and turnout: The political consequences of traumatic events.American Political Science Review, 2022. doi: 10.1017/s0003055422001010. URL https://doi.org/10.1017/s0003055422001010

work page doi:10.1017/s0003055422001010 2022

[71] [76]

McClendon

Gwyneth H. McClendon. Social esteem and participation in contentious politics: A field experiment at an lgbt pride rally.American Journal of Political Science, 2013. doi: 10.1111/ajps.12076. URL https: //doi.org/10.1111/ajps.12076

work page doi:10.1111/ajps.12076 2013

[72] [80]

From top-down to trickle-up influence: Revisiting assumptions about the family in political socialization.Political Communication, 2002

Michael McDevitt and Steven Chaffee. From top-down to trickle-up influence: Revisiting assumptions about the family in political socialization.Political Communication, 2002. doi: 10.1080/01957470290055501. URL https://doi.org/10.1080/01957470290055501. 18

work page doi:10.1080/01957470290055501 2002

[73] [81]

Exploiting friends-and-neighbors to estimate coattail effects.American Political Science Review,

Marc Meredith. Exploiting friends-and-neighbors to estimate coattail effects.American Political Science Review,

work page

[74] [82]

URLhttps://doi.org/10.1017/s0003055413000439

doi: 10.1017/s0003055413000439. URLhttps://doi.org/10.1017/s0003055413000439

work page doi:10.1017/s0003055413000439

[75] [83]

Secular party rule and religious violence in pakistan.American Political Science Review, 2017

Gareth Nellis and Niloufer Siddiqui. Secular party rule and religious violence in pakistan.American Political Science Review, 2017. doi: 10.1017/s0003055417000491. URL https://doi.org/10.1017/s0003055417000491

work page doi:10.1017/s0003055417000491 2017

[76] [84]

Lucas M. Novaes. Disloyal brokers and weak parties.American Journal of Political Science, 2017. doi: 10.1111/ajps.12331. URLhttps://doi.org/10.1111/ajps.12331

work page doi:10.1111/ajps.12331 2017

[77] [85]

Ana L. De La O. Do conditional cash transfers affect electoral behavior? evidence from a randomized experiment in mexico.American Journal of Political Science, 2012. doi: 10.1111/j.1540-5907.2012.00617.x. URLhttps://doi.org/10.1111/j.1540-5907.2012.00617.x

work page doi:10.1111/j.1540-5907.2012.00617.x 2012

[78] [86]

Paglayan

Agustina S. Paglayan. Education or indoctrination? the violent origins of public school systems in an era of state-building.American Political Science Review, 2022. doi: 10.1017/s0003055422000247. URL https://doi.org/10.1017/s0003055422000247

work page doi:10.1017/s0003055422000247 2022

[79] [88]

Capitol gains: The returns to elected office from corporate board directorships.The Journal of Politics, 2016

Maxwell Palmer and Benjamin Schneer. Capitol gains: The returns to elected office from corporate board directorships.The Journal of Politics, 2016. doi: 10.1086/683206. URLhttps://doi.org/10.1086/683206

work page doi:10.1086/683206 2016

[80] [89]

Julia A. Payson. The partisan logic of city mobilization: Evidence from state lobbying disclosures.Amer- ican Political Science Review, 2020. doi: 10.1017/s0003055420000118. URL https://doi.org/10.1017/ s0003055420000118

work page doi:10.1017/s0003055420000118 2020