Probing the Stochastic Machine: Engaging with LLMs in Statistics Curricula Through Veridical Data Science
Pith reviewed 2026-06-30 04:15 UTC · model grok-4.3
The pith
Statistics curricula should treat LLMs as objects of inquiry where students design experiments to analyze output variability and bias.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large language models are interactive stochastic systems whose most consequential behaviors remain only partially understood, therefore statistics curricula should treat them as objects of inquiry: students probe variability, bias, and prompt sensitivity by designing small experiments and analyzing distributions of outputs, organized through the Veridical Data Science framework and PCS principles across educational levels with four proposed curricular examples.
What carries the argument
Veridical Data Science framework together with Predictability-Computability-Stability (PCS) principles, used to structure student experiments that treat LLM outputs as data for statistical analysis.
If this is right
- Introductory students begin with activities such as asking an LLM the same question twice to observe response variability.
- Graduate students conduct PCS stability audits on entire LLM-assisted analysis workflows.
- Students at all levels practice treating sequences of LLM outputs as empirical distributions that require statistical summarization.
- Curricula gain explicit attention to how prompt changes alter output distributions and introduce bias.
Where Pith is reading between the lines
- The method could extend to comparing output distributions across different LLMs to illustrate model selection principles.
- Integration with traditional simulation exercises might help isolate what students learn specifically from real LLM stochasticity.
- Departments adopting this approach may need to develop shared prompt libraries so experiments remain reproducible across classes.
Load-bearing premise
Student-designed experiments on LLM outputs will reliably build transferable statistical reasoning skills even though the models' randomness may create confusion or misleading intuitions.
What would settle it
A controlled comparison in which students who complete LLM probing activities show no measurable improvement in understanding variability, bias, or prompt effects compared with students taught the same concepts through conventional examples.
read the original abstract
Large language models (LLMs) are interactive stochastic systems whose most consequential behaviors are still only partially understood. This discussion argues that statistics curricula should treat LLMs not only as tools, but as objects of inquiry: students can probe variability, bias, and prompt sensitivity by designing small experiments and analyzing distributions of outputs. Building on the Veridical Data Science framework and Predictability-Computability-Stability (PCS) principles, this discussion outlines how to organize critical LLM engagement across educational levels and propose four curricular examples, from introductory ``ask it twice'' activities to graduate PCS stability audits of LLM-based analysis workflows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a discussion piece arguing that statistics curricula should treat large language models (LLMs) not only as tools but as objects of inquiry. Students would design small experiments to probe variability, bias, and prompt sensitivity in LLM outputs, organized around the Veridical Data Science framework and Predictability-Computability-Stability (PCS) principles. Four curricular examples are proposed, ranging from introductory 'ask it twice' activities to graduate-level PCS stability audits of LLM-based analysis workflows.
Significance. If adopted, the proposal could help statistics education adapt to AI tools by reinforcing core concepts of uncertainty and reproducibility through direct engagement with stochastic systems. The VDS/PCS grounding provides a coherent structure for critical inquiry that aligns with existing statistical pedagogy and may foster transferable reasoning skills about model behavior.
minor comments (2)
- The abstract and introduction would benefit from a brief explicit list or table summarizing the four curricular examples and their intended educational levels to improve readability for readers scanning the proposal.
- Section describing the graduate-level PCS stability audits should clarify how the stability metric is operationalized when applied to LLM workflows, as the current framing leaves the concrete implementation steps somewhat open.
Simulated Author's Rebuttal
We thank the referee for their positive summary and significance assessment of the manuscript, as well as the recommendation for minor revision. The referee's description accurately captures the core proposal of treating LLMs as stochastic systems for student experimentation within the Veridical Data Science and PCS framework.
Circularity Check
No significant circularity identified
full rationale
The paper is a position/discussion piece proposing curricular activities for engaging LLMs in statistics education. It contains no derivations, equations, fitted parameters, predictions, or load-bearing self-citations that reduce claims to inputs by construction. References to Veridical Data Science and PCS principles are external frameworks used to organize suggestions, not self-referential definitions or unverified uniqueness theorems. The central recommendation stands as an independent pedagogical proposal without internal reduction to its own assumptions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The American Statistician , number=
An overview of large language models for statisticians , author=. The American Statistician , number=. 2026 , publisher=
2026
-
[2]
, editor =
Yu, Bin and Barter, Rebecca L. , editor =. Veridical
-
[3]
Journal of Statistics and Data Science Education , volume=
Teaching Veridical Data Science: A Panel Interview with Matteo Bonvini, Andrew Bray, Ruobin Gong and Bin Yu , author=. Journal of Statistics and Data Science Education , volume=. 2026 , publisher=
2026
-
[4]
arXiv preprint arXiv:2508.00835 , year=
PCS Workflow for Veridical Data Science in the Age of AI , author=. arXiv preprint arXiv:2508.00835 , year=
-
[5]
Frontiers of Statistics in Science and Engineering: 2035 and Beyond , author =. doi:10.17226/29292 , isbn =
-
[6]
Proceedings of the 2024 conference on empirical methods in natural language processing , pages=
A survey on in-context learning , author=. Proceedings of the 2024 conference on empirical methods in natural language processing , pages=
2024
-
[7]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[8]
International Conference on Machine Learning , pages=
Performative prediction , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[9]
Proceedings of the National Academy of Sciences , volume =
Bin Yu and Karl Kumbier , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =
2020
-
[10]
Salvatore, Nikolaus and Wang, Hao and Zhang, Qiong , year = 2025, month = oct, number =. Lost in the. 2510.10276 , primaryclass =
-
[11]
Gu, Xiangming and Pang, Tianyu and Du, Chao and Liu, Qian and Zhang, Fengzhuo and Du, Cunxiao and Wang, Ye and Lin, Min , year = 2025, month = mar, number =. When. 2410.10781 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Harvard Data Science Review , volume=
Introducing GAISE II: A guideline for precollege statistics and data science education , author=. Harvard Data Science Review , volume=. 2020 , publisher=
2020
-
[13]
Harvard Data Science Review , volume=
What should data science education do with large language models? , author=. Harvard Data Science Review , volume=. 2024 , publisher=
2024
-
[14]
Harvard Journal on Legislation , volume=
Generative misinterpretation , author=. Harvard Journal on Legislation , volume=
-
[15]
New York University Law Review , pages=
Generative interpretation , author=. New York University Law Review , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.