Probing the Stochastic Machine: Engaging with LLMs in Statistics Curricula Through Veridical Data Science

Tian Zheng

arxiv: 2606.29754 · v1 · pith:KPJ65RVAnew · submitted 2026-06-29 · 📊 stat.AP

Probing the Stochastic Machine: Engaging with LLMs in Statistics Curricula Through Veridical Data Science

Tian Zheng This is my paper

Pith reviewed 2026-06-30 04:15 UTC · model grok-4.3

classification 📊 stat.AP

keywords large language modelsstatistics educationveridical data sciencePCS principlesstochastic systemsprompt sensitivitycurriculum examplesdata science education

0 comments

The pith

Statistics curricula should treat LLMs as objects of inquiry where students design experiments to analyze output variability and bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that LLMs function as interactive stochastic systems whose behaviors require direct statistical investigation rather than passive use. Students learn core concepts by creating small experiments that measure how outputs vary, respond to different prompts, and exhibit bias, then analyzing those distributions. This method follows the Veridical Data Science framework and PCS principles to structure activities that scale from introductory repetition checks to graduate-level workflow audits. Four concrete curricular examples illustrate how the approach fits different education levels.

Core claim

Large language models are interactive stochastic systems whose most consequential behaviors remain only partially understood, therefore statistics curricula should treat them as objects of inquiry: students probe variability, bias, and prompt sensitivity by designing small experiments and analyzing distributions of outputs, organized through the Veridical Data Science framework and PCS principles across educational levels with four proposed curricular examples.

What carries the argument

Veridical Data Science framework together with Predictability-Computability-Stability (PCS) principles, used to structure student experiments that treat LLM outputs as data for statistical analysis.

If this is right

Introductory students begin with activities such as asking an LLM the same question twice to observe response variability.
Graduate students conduct PCS stability audits on entire LLM-assisted analysis workflows.
Students at all levels practice treating sequences of LLM outputs as empirical distributions that require statistical summarization.
Curricula gain explicit attention to how prompt changes alter output distributions and introduce bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to comparing output distributions across different LLMs to illustrate model selection principles.
Integration with traditional simulation exercises might help isolate what students learn specifically from real LLM stochasticity.
Departments adopting this approach may need to develop shared prompt libraries so experiments remain reproducible across classes.

Load-bearing premise

Student-designed experiments on LLM outputs will reliably build transferable statistical reasoning skills even though the models' randomness may create confusion or misleading intuitions.

What would settle it

A controlled comparison in which students who complete LLM probing activities show no measurable improvement in understanding variability, bias, or prompt effects compared with students taught the same concepts through conventional examples.

read the original abstract

Large language models (LLMs) are interactive stochastic systems whose most consequential behaviors are still only partially understood. This discussion argues that statistics curricula should treat LLMs not only as tools, but as objects of inquiry: students can probe variability, bias, and prompt sensitivity by designing small experiments and analyzing distributions of outputs. Building on the Veridical Data Science framework and Predictability-Computability-Stability (PCS) principles, this discussion outlines how to organize critical LLM engagement across educational levels and propose four curricular examples, from introductory ``ask it twice'' activities to graduate PCS stability audits of LLM-based analysis workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A curricular proposal for probing LLMs in stats classes that makes sense on paper but lacks any supporting evidence.

read the letter

The paper's main suggestion is to have statistics students treat LLMs as objects of study by running small experiments on their output distributions to learn about variability, bias, and stability. It outlines activities at different levels based on the Veridical Data Science and PCS frameworks.

This builds sensibly on prior work by the author. The examples, like asking the model the same question twice in intro classes or auditing stability in advanced ones, give a clear way to integrate the idea without needing new theory. It correctly identifies that LLMs are stochastic and that probing them can illustrate statistical principles.

The soft spot is the complete lack of any trial or assessment. We have no information on whether these activities actually improve learning or if the unpredictability of LLMs creates more confusion than insight. The proposal flags this issue but leaves it unaddressed.

This piece is for statistics educators who are updating their courses to deal with AI tools. Someone looking for practical ideas on how to discuss LLMs in class might find the structure helpful.

It should go to peer review for a discussion or education-focused outlet. The recommendation is worth considering even without data, as long as reviewers understand it's a starting point for conversation.

Referee Report

0 major / 2 minor

Summary. The manuscript is a discussion piece arguing that statistics curricula should treat large language models (LLMs) not only as tools but as objects of inquiry. Students would design small experiments to probe variability, bias, and prompt sensitivity in LLM outputs, organized around the Veridical Data Science framework and Predictability-Computability-Stability (PCS) principles. Four curricular examples are proposed, ranging from introductory 'ask it twice' activities to graduate-level PCS stability audits of LLM-based analysis workflows.

Significance. If adopted, the proposal could help statistics education adapt to AI tools by reinforcing core concepts of uncertainty and reproducibility through direct engagement with stochastic systems. The VDS/PCS grounding provides a coherent structure for critical inquiry that aligns with existing statistical pedagogy and may foster transferable reasoning skills about model behavior.

minor comments (2)

The abstract and introduction would benefit from a brief explicit list or table summarizing the four curricular examples and their intended educational levels to improve readability for readers scanning the proposal.
Section describing the graduate-level PCS stability audits should clarify how the stability metric is operationalized when applied to LLM workflows, as the current framing leaves the concrete implementation steps somewhat open.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and significance assessment of the manuscript, as well as the recommendation for minor revision. The referee's description accurately captures the core proposal of treating LLMs as stochastic systems for student experimentation within the Veridical Data Science and PCS framework.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper is a position/discussion piece proposing curricular activities for engaging LLMs in statistics education. It contains no derivations, equations, fitted parameters, predictions, or load-bearing self-citations that reduce claims to inputs by construction. References to Veridical Data Science and PCS principles are external frameworks used to organize suggestions, not self-referential definitions or unverified uniqueness theorems. The central recommendation stands as an independent pedagogical proposal without internal reduction to its own assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical content, free parameters, axioms, or invented entities are present; this is an educational discussion paper.

pith-pipeline@v0.9.1-grok · 5622 in / 1037 out tokens · 60758 ms · 2026-06-30T04:15:16.369336+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 4 canonical work pages · 1 internal anchor

[1]

The American Statistician , number=

An overview of large language models for statisticians , author=. The American Statistician , number=. 2026 , publisher=

2026
[2]

, editor =

Yu, Bin and Barter, Rebecca L. , editor =. Veridical
[3]

Journal of Statistics and Data Science Education , volume=

Teaching Veridical Data Science: A Panel Interview with Matteo Bonvini, Andrew Bray, Ruobin Gong and Bin Yu , author=. Journal of Statistics and Data Science Education , volume=. 2026 , publisher=

2026
[4]

arXiv preprint arXiv:2508.00835 , year=

PCS Workflow for Veridical Data Science in the Age of AI , author=. arXiv preprint arXiv:2508.00835 , year=

work page arXiv
[5]

doi:10.17226/29292 , isbn =

Frontiers of Statistics in Science and Engineering: 2035 and Beyond , author =. doi:10.17226/29292 , isbn =

work page doi:10.17226/29292 2035
[6]

Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

A survey on in-context learning , author=. Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

2024
[7]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[8]

International Conference on Machine Learning , pages=

Performative prediction , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020
[9]

Proceedings of the National Academy of Sciences , volume =

Bin Yu and Karl Kumbier , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =

2020
[10]

Lost in the

Salvatore, Nikolaus and Wang, Hao and Zhang, Qiong , year = 2025, month = oct, number =. Lost in the. 2510.10276 , primaryclass =

work page arXiv 2025
[11]

Gu, Xiangming and Pang, Tianyu and Du, Chao and Liu, Qian and Zhang, Fengzhuo and Du, Cunxiao and Wang, Ye and Lin, Min , year = 2025, month = mar, number =. When. 2410.10781 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Harvard Data Science Review , volume=

Introducing GAISE II: A guideline for precollege statistics and data science education , author=. Harvard Data Science Review , volume=. 2020 , publisher=

2020
[13]

Harvard Data Science Review , volume=

What should data science education do with large language models? , author=. Harvard Data Science Review , volume=. 2024 , publisher=

2024
[14]

Harvard Journal on Legislation , volume=

Generative misinterpretation , author=. Harvard Journal on Legislation , volume=
[15]

New York University Law Review , pages=

Generative interpretation , author=. New York University Law Review , pages=

[1] [1]

The American Statistician , number=

An overview of large language models for statisticians , author=. The American Statistician , number=. 2026 , publisher=

2026

[2] [2]

, editor =

Yu, Bin and Barter, Rebecca L. , editor =. Veridical

[3] [3]

Journal of Statistics and Data Science Education , volume=

Teaching Veridical Data Science: A Panel Interview with Matteo Bonvini, Andrew Bray, Ruobin Gong and Bin Yu , author=. Journal of Statistics and Data Science Education , volume=. 2026 , publisher=

2026

[4] [4]

arXiv preprint arXiv:2508.00835 , year=

PCS Workflow for Veridical Data Science in the Age of AI , author=. arXiv preprint arXiv:2508.00835 , year=

work page arXiv

[5] [5]

doi:10.17226/29292 , isbn =

Frontiers of Statistics in Science and Engineering: 2035 and Beyond , author =. doi:10.17226/29292 , isbn =

work page doi:10.17226/29292 2035

[6] [6]

Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

A survey on in-context learning , author=. Proceedings of the 2024 conference on empirical methods in natural language processing , pages=

2024

[7] [7]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[8] [8]

International Conference on Machine Learning , pages=

Performative prediction , author=. International Conference on Machine Learning , pages=. 2020 , organization=

2020

[9] [9]

Proceedings of the National Academy of Sciences , volume =

Bin Yu and Karl Kumbier , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =

2020

[10] [10]

Lost in the

Salvatore, Nikolaus and Wang, Hao and Zhang, Qiong , year = 2025, month = oct, number =. Lost in the. 2510.10276 , primaryclass =

work page arXiv 2025

[11] [11]

Gu, Xiangming and Pang, Tianyu and Du, Chao and Liu, Qian and Zhang, Fengzhuo and Du, Cunxiao and Wang, Ye and Lin, Min , year = 2025, month = mar, number =. When. 2410.10781 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Harvard Data Science Review , volume=

Introducing GAISE II: A guideline for precollege statistics and data science education , author=. Harvard Data Science Review , volume=. 2020 , publisher=

2020

[13] [13]

Harvard Data Science Review , volume=

What should data science education do with large language models? , author=. Harvard Data Science Review , volume=. 2024 , publisher=

2024

[14] [14]

Harvard Journal on Legislation , volume=

Generative misinterpretation , author=. Harvard Journal on Legislation , volume=

[15] [15]

New York University Law Review , pages=

Generative interpretation , author=. New York University Law Review , pages=