PLanet: Formalizing and Analyzing Assignment Procedures in the Design of Experiments

Adam Chlipala; Anna Zhang; Emery Berger; Eunice Jun; London Bielicke; Shruti Tyagi

arxiv: 2505.09094 · v3 · submitted 2025-05-14 · 💻 cs.HC

PLanet: Formalizing and Analyzing Assignment Procedures in the Design of Experiments

London Bielicke , Anna Zhang , Shruti Tyagi , Emery Berger , Adam Chlipala , Eunice Jun This is my paper

Pith reviewed 2026-05-22 16:09 UTC · model grok-4.3

classification 💻 cs.HC

keywords experimental designcausal queriesdomain-specific languagematrix algebrastatic analysisconstraint satisfactionassignment procedures

0 comments

The pith

PLanet's grammar and matrix representation for experimental designs enable static analysis to identify testable causal queries under different assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Experimental designs rest on assumptions about variable relationships that determine which causal questions can be answered. Existing tools leave these assumptions implicit, so researchers must reason about them manually. PLanet introduces a composable grammar of operators for building assignment procedures, expressed through matrix algebra and compiled into constraint satisfaction problems. This representation supports static analysis that automatically checks which causal queries are testable given specific assumptions. The result is more explicit design exploration and fewer overlooked assumptions about the variables involved.

Core claim

By defining a grammar of composable operators for assignment procedures and representing them in matrix algebra, PLanet compiles designs into constraint satisfaction problems. This allows a static analysis to determine the testability of causal queries under varying assumptions, making design choices and assumptions explicit without requiring full procedural code.

What carries the argument

Composable grammar of operators for assignment procedures, grounded in matrix algebra and compiled to constraint satisfaction problems over matrices.

Load-bearing premise

The matrix algebra representation and constraint satisfaction encoding must capture all relevant assumptions about variable relationships without omitting causal structure or introducing artifacts.

What would settle it

An experimental design in which PLanet's static analysis labels a causal query as testable but a manual causal graph analysis shows it is not identifiable, or the reverse.

Figures

Figures reproduced from arXiv: 2505.09094 by Adam Chlipala, Anna Zhang, Emery Berger, Eunice Jun, London Bielicke, Shruti Tyagi.

**Figure 1.** Figure 1: Composing designs in PLanet using cross and nest. In the crossed design (left), every row contains every condition of each variable but not every combination (e.g., X and Y appear with A and B, but not all combinations XA, XB, YA, YB appear in one row). In the nested design (right), the outer condition (A or B) is held fixed within each 2 × 2 block while the inner conditions (X and Y) alternate. Variables.… view at source ↗

**Figure 3.** Figure 3: Generating viable experimental plans. PLanet determines the shape of the design matrix and places constraints on entries of the matrix (left) before generating a Z3 model (middle). The numbers in the matrix map directly to specific values of a bitvector encoding, which represents possible assignments to a set of variables. The last step translates the matrix to a table with all viable experimental plans (… view at source ↗

**Figure 2.** Figure 2: Our formal grammar of experimental assignment. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: PLanet’s user interface comparing two experimental designs from our user evaluation (Section 8). [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: PLanet program (left) representing an experiment [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Nested design from Sweating the Details: Emotion Recognition and the Influence of Physical Exertion in Virtual Reality Exergaming [29] implemented in PLanet and edibble. PLanet correctly and explicitly represents that both the Exercise Intensity and Emotion VE conditions are counterbalanced and that there are 72 participants. The edibble program does not correctly represent this design. edibble’s strict u… view at source ↗

**Figure 7.** Figure 7: R1’s intended experimental assignment and PLanet program. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

read the original abstract

Experimental designs reflect assumptions about variable relationships that determine what causal queries researchers can answer through the experiment. Accounting for and communicating these assumptions is essential for drawing valid, generalizable conclusions from scientific experiments. Unfortunately, existing experimental design tools elide these details, expecting researchers to reason about design decisions and assumptions on their own. To surface assumptions and enable design exploration, we introduce a grammar of composable operators for constructing experimental assignment procedures grounded in matrix algebra. The PLanet DSL implements this grammar and compiles PLanet programs into constraint satisfaction problems over matrices. Together, PLanet's composable grammar and matrix representation enable a static analysis to determine which causal queries are testable under different assumptions. In an expressivity evaluation, PLanet was the most expressive of existing DSLs. Critical reflections with the authors of these DSLs revealed that PLanet makes design choices explicit without requiring procedural specification. Think-aloud studies showed that PLanet facilitated design exploration and surfaced assumptions researchers may otherwise overlook.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PLanet gives a composable matrix-based grammar for assignment procedures that supports static checks on testable causal queries, but the encoding's fidelity to full causal structure still needs verification.

read the letter

The main thing to know is that PLanet supplies a grammar of composable operators for writing down experimental assignment procedures, then compiles them to constraint satisfaction problems over matrices so a static analysis can flag which causal queries remain testable under the stated assumptions. That combination is new relative to the prior DSLs they compare against. The expressivity evaluation positions it ahead of those tools, and the think-aloud sessions plus reflections with the other DSL authors indicate it surfaces design choices without forcing users to spell out every procedural step. Those are the concrete advances. The soft spot is the matrix-plus-CSP representation itself. Matrix algebra is natural for linear assignments and marginals, but it is not obvious that it preserves all the conditional independencies, latent variables, or non-linear intervention effects that determine identifiability in real designs. The abstract gives no quantitative details on how the static analysis was checked against ground-truth queries or how many participants were in the studies, so the strength of the supporting evidence is still unclear. This work is aimed at HCI researchers and others who run experiments and want to make their design assumptions explicit and checkable. A reader who cares about formal methods for empirical work would get practical value from the operators and the analysis pipeline. It deserves a serious referee because the core idea is useful and the implementation is grounded enough to be worth checking in detail, even if the causal soundness claims will need more evidence.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PLanet, a domain-specific language for formalizing experimental assignment procedures via a composable grammar grounded in matrix algebra. PLanet programs are compiled into constraint satisfaction problems over matrices, which in turn support a static analysis that determines which causal queries are testable under different assumptions about variable relationships. The work reports an expressivity evaluation showing PLanet outperforms prior DSLs, critical reflections with authors of those DSLs, and think-aloud studies indicating that the approach surfaces assumptions and supports design exploration.

Significance. If the matrix representation and CSP encoding are shown to be sound and complete for the range of designs claimed, the work could meaningfully advance experimental design practice in HCI and related fields by making implicit causal assumptions explicit and enabling automated testability analysis. The formal grounding in matrix algebra and the combination of expressivity comparison with user studies are positive features; the result would be more impactful if it included falsifiable checks against established causal-identifiability results.

major comments (2)

[Sections describing the matrix representation and static analysis (likely §3–4)] The central claim that the grammar's matrix representation and compilation to CSPs faithfully encode all relevant assumptions (independence, blocking, randomization, latent variables) so that static analysis correctly decides testability is load-bearing. Matrix algebra naturally represents linear marginals and assignments but risks distorting conditional independencies or non-linear intervention effects; the manuscript should supply either a formal soundness argument or an empirical validation against known causal diagrams for standard designs (e.g., blocked RCTs) to confirm that testability verdicts match results from the causal-inference literature.
[Evaluation section (expressivity comparison and think-aloud studies)] The expressivity evaluation and think-aloud studies are presented without quantitative details on participant numbers, task protocols, or how the static-analysis outputs were validated against ground-truth causal queries. This absence weakens the claim that PLanet is the most expressive DSL and that it reliably surfaces overlooked assumptions.

minor comments (2)

[Introduction and grammar definition] Clarify the precise scope of the grammar with respect to non-linear or time-varying assignment procedures; a short limitations paragraph would help readers understand where the matrix encoding may intentionally abstract away structure.
[Grammar and compilation sections] Add explicit cross-references between the matrix operators and the corresponding causal assumptions they encode (e.g., which operator corresponds to blocking).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional rigor and clarity will strengthen the manuscript. We address each major comment below and indicate the revisions planned for the next version.

read point-by-point responses

Referee: The central claim that the grammar's matrix representation and compilation to CSPs faithfully encode all relevant assumptions (independence, blocking, randomization, latent variables) so that static analysis correctly decides testability is load-bearing. Matrix algebra naturally represents linear marginals and assignments but risks distorting conditional independencies or non-linear intervention effects; the manuscript should supply either a formal soundness argument or an empirical validation against known causal diagrams for standard designs (e.g., blocked RCTs) to confirm that testability verdicts match results from the causal-inference literature.

Authors: We agree that the soundness of the matrix representation and CSP encoding is central to the contribution and that the current manuscript relies primarily on informal arguments and illustrative examples rather than a complete formal proof. We will add a dedicated subsection in the revised version that states a soundness theorem for the linear case, provides a proof sketch based on the correspondence between matrix constraints and d-separation, and includes an empirical validation table comparing PLanet's testability verdicts for blocked RCTs, randomized block designs, and Latin-square designs against established results from the causal-identifiability literature. We will also explicitly delimit the scope to linear models and note that non-linear intervention effects fall outside the current guarantees. revision: yes
Referee: The expressivity evaluation and think-aloud studies are presented without quantitative details on participant numbers, task protocols, or how the static-analysis outputs were validated against ground-truth causal queries. This absence weakens the claim that PLanet is the most expressive DSL and that it reliably surfaces overlooked assumptions.

Authors: We accept that the evaluation section would benefit from more precise and quantitative reporting. The current draft mentions the think-aloud studies and expressivity comparison at a summary level but does not include a participant table, full task protocol, or explicit validation procedure against ground-truth queries. In the revision we will expand Section 5 to add (1) a table with exact participant counts, recruitment criteria, and session durations, (2) the complete task protocol and materials, and (3) a new validation subsection that lists the ground-truth causal queries for each evaluated design and reports agreement with PLanet's static-analysis outputs. These additions will make the empirical claims more transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PLanet's formalization and analysis

full rationale

The paper introduces a new composable grammar for experimental assignment procedures, represented via matrix algebra and compiled to constraint satisfaction problems to support static analysis of testable causal queries. No derivation steps, equations, or results in the abstract or context reduce a claimed outcome to its own inputs by construction, self-definition, or fitted parameters renamed as predictions. Expressivity comparisons are made against prior external DSLs, and reflections/user studies provide independent evaluation. No load-bearing self-citations, uniqueness theorems imported from authors, or ansatzes smuggled via citation appear in the material. The contribution is a constructive definition of a DSL and analysis tool rather than a circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that matrix algebra can faithfully encode assignment procedures and their causal implications; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption Experimental assignment procedures and their causal implications can be represented using matrix algebra without loss of critical structure.
The grammar is explicitly grounded in matrix algebra per the abstract.

pith-pipeline@v0.9.0 · 5709 in / 1230 out tokens · 34038 ms · 2026-05-22T16:09:12.441794+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a grammar of composable operators for constructing experimental assignment procedures grounded in matrix algebra. The PLanet DSL implements this grammar and compiles PLanet programs into constraint satisfaction problems over matrices.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PLanet checks for positivity of main, interaction, and time-based effects... We implement static analyses to check for time-based confounding and positivity violations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

[1]

American Psychological Association. n.d.. APA Dictionary of Psychology — dictionary.apa.org. https://dictionary.apa.org/. [Accessed 02-04-2025]

work page 2025
[2]

Eytan Bakshy, Dean Eckles, and Michael S Bernstein. 2014. Designing and deploy- ing online field experiments. InProceedings of the 23rd international conference on World wide web. ACM, 283–292

work page 2014
[3]

Alan F Blackwell, Carol Britton, A Cox, Thomas RG Green, Corin Gurr, Gada Kadoda, MS Kutar, Martin Loomes, Chrystopher L Nehaniv, Marian Petre, et al

work page
[4]

InInternational Conference on Cognitive Technology

Cognitive dimensions of notations: Design tools for cognitive technology. InInternational Conference on Cognitive Technology. Springer, 325–341

work page
[5]

Graeme Blair, Jasper Cooper, Alexander Coppock, and Macartan Humphreys

work page
[6]

Declaring and diagnosing research designs.American Political Science Review113, 3 (2019), 838–859

work page 2019
[7]

Campbell and Julian C

Donald T. Campbell and Julian C. Stanley. 1963.Experimental and Quasi- Experimental Designs for Research. Houghton Mifflin, Boston

work page 1963
[8]

Liwei Chan, Tzu Wei Mi, Zung Hao Hsueh, Yi Ci Huang, and Ming Yun Hsu

work page
[10]

Gary Charness, Uri Gneezy, and Michael A. Kuhn. 2012. Experimental methods: Between-subject and within-subject design.Journal of Economic Behavior & Organization81, 1 (2012), 1–8. doi:10.1016/j.jebo.2011.08.009

work page doi:10.1016/j.jebo.2011.08.009 2012
[11]

Cook and Donald T

Thomas D. Cook and Donald T. Campbell. 1979.Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin, Boston

work page 1979
[12]

2000.The Theory of the Design of Experiments

David Roxbee Cox and Nancy Reid. 2000.The Theory of the Design of Experiments. CRC Press

work page 2000
[13]

Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes. 2023. Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for ...

work page doi:10.1145/3544548.3580672 2023
[14]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340

work page 2008
[15]

Smit Desai and Jessie Chin. 2023. OK Google, Let’s Learn: Using Voice User Interfaces for Informal Self-Regulated Learning of Health Topics among Younger and Older Adults. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). ACM, 1–21. doi:10.1145/3544548.3581507

work page doi:10.1145/3544548.3581507 2023
[16]

2018.Advanced Control Design with Application to Electromechanical Systems

Jean-Marie Dufour and Patrick Frenkiel. 2018.Advanced Control Design with Application to Electromechanical Systems. Springer, Cham, Switzerland

work page 2018
[17]

Alexander Eiselmayer, Chatchavan Wacharamanotham, Michel Beaudouin-Lafon, and Wendy Mackay. 2019. Touchstone2: An Interactive Environment for Explor- ing Trade-offs in HCI Experiment Design. (2019)

work page 2019
[18]

Ronald A. Fisher. 1935.The Design of Experiments. Oliver and Boyd, Edinburgh

work page 1935
[19]

Takao Fujii, Katie Seaborn, and Madeleine Steeds. 2024. Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 511, 14 pages. doi:10.1145/3613904.3642303

work page doi:10.1145/3613904.3642303 2024
[20]

Hernan and James M

Miguel A. Hernan and James M. Robins. 2025.Causal Inference: What If. CRC Press, Boca Raton

work page 2025
[21]

Arata Jingu, Nihar Sabnis, Paul Strohmeier, and Jürgen Steimle. 2024. Shaping Compliance: Inducing Haptic Illusion of Compliance in Different Shapes with Electrotactile Grains. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3641907

work page doi:10.1145/3613904.3641907 2024
[22]

Eunice Jun, Maureen Daum, Jared Roesch, Sarah E Chasins, Emery D Berger, Rene Just, and Katharina Reinecke. 2019. Tea: A High-level Language and Runtime System for Automating Statistical Analysis. InProceedings of the 32nd Annual Symposium on User Interface Software and Technology. ACM

work page 2019
[23]

Eunice Jun, Edward Misback, Jeffrey Heer, and René Just. 2024. rTisane: Exter- nalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling. InProceedings of the CHI Confer- ence on Human Factors in Computing Systems. 1–16

work page 2024
[24]

Eunice Jun, Audrey Seo, Jeffrey Heer, and René Just. 2022. Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16

work page 2022
[25]

Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experi- mental design for developing just-in-time adaptive interventions.Health Psychol. 34S, Suppl (Dec. 2015), 1220–1228

work page 2015
[26]

Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Bjoern Hartmann, and Can Liu

Susan Lin, Jeremy Warner, J.D. Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Bjoern Hartmann, and Can Liu. 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu,...

work page arXiv 2024
[27]

Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. InProceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1425–1434

work page 2007
[28]

Sebastian Musslick, Anastasia Cherkaev, Ben Draut, Ahsan Sajjad Butt, Pierce Darragh, Vivek Srikumar, Matthew Flatt, and Jonathan D. Cohen. 2022. SweetPea: A standard language for factorial experimental design.Behavior Research Methods 54 (2022), 805–829. doi:10.3758/s13428-021-01598-2

work page doi:10.3758/s13428-021-01598-2 2022
[29]

1994.Usability Engineering

Jakob Nielsen. 1994.Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1994
[30]

N.L.J. 1959. Planning of Experiments. By D. R. Cox. [Pp. vi+308. New York: John Wiley and Sons, Inc. London: Chapman and Hall, Ltd., 1958. 60s.].Journal of the Institute of Actuaries85, 2 (1959), 317–319. doi:10.1017/S0020268100038063

work page doi:10.1017/s0020268100038063 1959
[31]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press

work page 2009
[33]

Arvind Satyanarayan, Bongshin Lee, Donghao Ren, Jeffrey Heer, John Stasko, John Thompson, Matthew Brehmer, and Zhicheng Liu. 2020. Critical Reflections on Visualization Authoring Systems.IEEE Transactions on Visualization and Computer Graphics26, 1 (2020), 461–471. doi:10.1109/TVCG.2019.2934281

work page doi:10.1109/tvcg.2019.2934281 2020
[34]

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer

work page
[35]

Vega-lite: A grammar of interactive graphics.IEEE transactions on visual- ization and computer graphics23, 1 (2017), 341–350

work page 2017
[36]

Howard J. Seltman. 2018.Experimental Design and Analysis. Carnegie Mellon University. https://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf

work page 2018
[37]

2002.Experi- mental and quasi-experimental designs for generalized causal inference

William Shadish, Thomas D Cook, and Donald Thomas Campbell. 2002.Experi- mental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Boston, MA

work page 2002
[38]

Xiyuan Shen, Chun Yu, Xutong Wang, Chen Liang, Haozhan Chen, and Yuanchun Shi. 2024. MouseRing: Always-available Touchpad Interaction with IMU Rings. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3642225

work page doi:10.1145/3613904.3642225 2024
[39]

Sloane and R.H

N.J.A. Sloane and R.H. Hardin. 2017. Gosset: A General-purpose program for designing experiments. http://neilsloane.com/gosset/

work page 2017
[40]

Ashley Suh, Ab Mosca, Eugene Wu, and Remco Chang. 2022. A grammar of hypotheses for visualization, data, and analysis.arXiv preprint arXiv:2204.14267 (2022)

work page arXiv 2022
[41]

Shyam Sundar

Yuan Sun, Magdalayna Drivas, Mengqi Liao, and S. Shyam Sundar. 2023. When Recommender Systems Snoop into Social Media, Users Trust them Less for Health Advice. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 818, 14 pages. doi:10.1145...

work page doi:10.1145/3544548.3581123 2023
[42]

Emi Tanaka. 2021. Edibble: An R-package to construct designs using the grammar of experimental design. https://github.com/emitanaka/edibble

work page 2021
[43]

Emma Tosch, Eytan Bakshy, Emery D Berger, David D Jensen, and J Eliot B Moss. 2019. Planalyzer: Assessing threats to the validity of online experiments. Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1–30

work page 2019
[44]

Manohar Narhar Vartak. 1955. On an Application of Kronecker Product of Matrices to Statistical Designs.The Annals of Mathematical Statistics26, 3 (1955), ACM UIST ’26, November 2–November 5, 2026, Detroit, Michigan Bielicke et al. 420 – 438. doi:10.1214/aoms/1177728488

work page doi:10.1214/aoms/1177728488 1955
[45]

Steeven Villa, Jasmin Niess, Takuro Nakao, Jonathan Lazar, Albrecht Schmidt, and Tonja-Katrin Machulla. 2023. Understanding Perception of Human Augmentation: A Mixed-Method Study. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 65, 16...

work page arXiv 2023
[46]

1999.The grammar of graphics

Leland Wilkinson. 1999.The grammar of graphics. Springer-Verlag, Berlin, Heidelberg

work page 1999
[47]

Guande Wu, Jing Qian, Sonia Castelo Quispe, Shaoyu Chen, João Rulff, and Claudio T. Silva. 2024. ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)(Honolulu, HI, USA). ACM. doi:10.1145/3613904. 3642772

work page doi:10.1145/3613904 2024
[48]

Tianhong Catherine Yu, Nancy Wang, Sarah Ellenbogen, and Cindy Hsin-Liu Kao

work page
[49]

InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23)

Skinergy: Machine-Embroidered Silicone-Textile Composites as On-Skin Self-Powered Input Sensors. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 33, 15 pages. doi:10.1145/3586183.3606729 PLanet: Formalizing and Analyz...

work page doi:10.1145/3586183.3606729 2026
[50]

within_subjects ( emotion_ve )

work page
[51]

counterbalance ( emotion_ve ) 16) 17exercise_design = ( 18Design ()

work page
[52]

within_subjects ( e x er c i se _ i nt e n si t y )

work page
[53]

task order

counterbalance ( e x er c i se _ i nt e n si t y ) 21) 22design = nest ( exercise _design, emotion_design ) 23 24assign ( participants, design ) (a)PLanet (b)edibble Figure 6: Nested design fromSweating the Details: Emotion Recognition and the Influence of Physical Exertion in Virtual Reality Exergaming[ 29] implemented in PLanet and edibble. PLanet corre...

work page 2026
[54]

within_subjects ( tools )

work page
[55]

counterbalance ( tools )

work page
[56]

within_subjects ( tasks )

work page
[57]

counterbalance ( tasks )

work page
[58]

canonical

num_trials (2) 20) 21 22assignment = assign ( participants, design ) 23print ( assignment ) (b) Figure 7: R1’s intended experimental assignment and PLanet program.(a) The original spreadsheet R1 had previously used to manually construct experimental assignments for their study. (b) The PLanet program for R1’s experiment, which produces the same set of ord...

work page 2026

[1] [1]

American Psychological Association. n.d.. APA Dictionary of Psychology — dictionary.apa.org. https://dictionary.apa.org/. [Accessed 02-04-2025]

work page 2025

[2] [2]

Eytan Bakshy, Dean Eckles, and Michael S Bernstein. 2014. Designing and deploy- ing online field experiments. InProceedings of the 23rd international conference on World wide web. ACM, 283–292

work page 2014

[3] [3]

Alan F Blackwell, Carol Britton, A Cox, Thomas RG Green, Corin Gurr, Gada Kadoda, MS Kutar, Martin Loomes, Chrystopher L Nehaniv, Marian Petre, et al

work page

[4] [4]

InInternational Conference on Cognitive Technology

Cognitive dimensions of notations: Design tools for cognitive technology. InInternational Conference on Cognitive Technology. Springer, 325–341

work page

[5] [5]

Graeme Blair, Jasper Cooper, Alexander Coppock, and Macartan Humphreys

work page

[6] [6]

Declaring and diagnosing research designs.American Political Science Review113, 3 (2019), 838–859

work page 2019

[7] [7]

Campbell and Julian C

Donald T. Campbell and Julian C. Stanley. 1963.Experimental and Quasi- Experimental Designs for Research. Houghton Mifflin, Boston

work page 1963

[8] [8]

Liwei Chan, Tzu Wei Mi, Zung Hao Hsueh, Yi Ci Huang, and Ming Yun Hsu

work page

[9] [10]

Gary Charness, Uri Gneezy, and Michael A. Kuhn. 2012. Experimental methods: Between-subject and within-subject design.Journal of Economic Behavior & Organization81, 1 (2012), 1–8. doi:10.1016/j.jebo.2011.08.009

work page doi:10.1016/j.jebo.2011.08.009 2012

[10] [11]

Cook and Donald T

Thomas D. Cook and Donald T. Campbell. 1979.Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin, Boston

work page 1979

[11] [12]

2000.The Theory of the Design of Experiments

David Roxbee Cox and Nancy Reid. 2000.The Theory of the Design of Experiments. CRC Press

work page 2000

[12] [13]

Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes. 2023. Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for ...

work page doi:10.1145/3544548.3580672 2023

[13] [14]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340

work page 2008

[14] [15]

Smit Desai and Jessie Chin. 2023. OK Google, Let’s Learn: Using Voice User Interfaces for Informal Self-Regulated Learning of Health Topics among Younger and Older Adults. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). ACM, 1–21. doi:10.1145/3544548.3581507

work page doi:10.1145/3544548.3581507 2023

[15] [16]

2018.Advanced Control Design with Application to Electromechanical Systems

Jean-Marie Dufour and Patrick Frenkiel. 2018.Advanced Control Design with Application to Electromechanical Systems. Springer, Cham, Switzerland

work page 2018

[16] [17]

Alexander Eiselmayer, Chatchavan Wacharamanotham, Michel Beaudouin-Lafon, and Wendy Mackay. 2019. Touchstone2: An Interactive Environment for Explor- ing Trade-offs in HCI Experiment Design. (2019)

work page 2019

[17] [18]

Ronald A. Fisher. 1935.The Design of Experiments. Oliver and Boyd, Edinburgh

work page 1935

[18] [19]

Takao Fujii, Katie Seaborn, and Madeleine Steeds. 2024. Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 511, 14 pages. doi:10.1145/3613904.3642303

work page doi:10.1145/3613904.3642303 2024

[19] [20]

Hernan and James M

Miguel A. Hernan and James M. Robins. 2025.Causal Inference: What If. CRC Press, Boca Raton

work page 2025

[20] [21]

Arata Jingu, Nihar Sabnis, Paul Strohmeier, and Jürgen Steimle. 2024. Shaping Compliance: Inducing Haptic Illusion of Compliance in Different Shapes with Electrotactile Grains. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3641907

work page doi:10.1145/3613904.3641907 2024

[21] [22]

Eunice Jun, Maureen Daum, Jared Roesch, Sarah E Chasins, Emery D Berger, Rene Just, and Katharina Reinecke. 2019. Tea: A High-level Language and Runtime System for Automating Statistical Analysis. InProceedings of the 32nd Annual Symposium on User Interface Software and Technology. ACM

work page 2019

[22] [23]

Eunice Jun, Edward Misback, Jeffrey Heer, and René Just. 2024. rTisane: Exter- nalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling. InProceedings of the CHI Confer- ence on Human Factors in Computing Systems. 1–16

work page 2024

[23] [24]

Eunice Jun, Audrey Seo, Jeffrey Heer, and René Just. 2022. Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16

work page 2022

[24] [25]

Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experi- mental design for developing just-in-time adaptive interventions.Health Psychol. 34S, Suppl (Dec. 2015), 1220–1228

work page 2015

[25] [26]

Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Bjoern Hartmann, and Can Liu

Susan Lin, Jeremy Warner, J.D. Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Bjoern Hartmann, and Can Liu. 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu,...

work page arXiv 2024

[26] [27]

Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. InProceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1425–1434

work page 2007

[27] [28]

Sebastian Musslick, Anastasia Cherkaev, Ben Draut, Ahsan Sajjad Butt, Pierce Darragh, Vivek Srikumar, Matthew Flatt, and Jonathan D. Cohen. 2022. SweetPea: A standard language for factorial experimental design.Behavior Research Methods 54 (2022), 805–829. doi:10.3758/s13428-021-01598-2

work page doi:10.3758/s13428-021-01598-2 2022

[28] [29]

1994.Usability Engineering

Jakob Nielsen. 1994.Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1994

[29] [30]

N.L.J. 1959. Planning of Experiments. By D. R. Cox. [Pp. vi+308. New York: John Wiley and Sons, Inc. London: Chapman and Hall, Ltd., 1958. 60s.].Journal of the Institute of Actuaries85, 2 (1959), 317–319. doi:10.1017/S0020268100038063

work page doi:10.1017/s0020268100038063 1959

[30] [31]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press

work page 2009

[31] [33]

Arvind Satyanarayan, Bongshin Lee, Donghao Ren, Jeffrey Heer, John Stasko, John Thompson, Matthew Brehmer, and Zhicheng Liu. 2020. Critical Reflections on Visualization Authoring Systems.IEEE Transactions on Visualization and Computer Graphics26, 1 (2020), 461–471. doi:10.1109/TVCG.2019.2934281

work page doi:10.1109/tvcg.2019.2934281 2020

[32] [34]

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer

work page

[33] [35]

Vega-lite: A grammar of interactive graphics.IEEE transactions on visual- ization and computer graphics23, 1 (2017), 341–350

work page 2017

[34] [36]

Howard J. Seltman. 2018.Experimental Design and Analysis. Carnegie Mellon University. https://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf

work page 2018

[35] [37]

2002.Experi- mental and quasi-experimental designs for generalized causal inference

William Shadish, Thomas D Cook, and Donald Thomas Campbell. 2002.Experi- mental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Boston, MA

work page 2002

[36] [38]

Xiyuan Shen, Chun Yu, Xutong Wang, Chen Liang, Haozhan Chen, and Yuanchun Shi. 2024. MouseRing: Always-available Touchpad Interaction with IMU Rings. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3642225

work page doi:10.1145/3613904.3642225 2024

[37] [39]

Sloane and R.H

N.J.A. Sloane and R.H. Hardin. 2017. Gosset: A General-purpose program for designing experiments. http://neilsloane.com/gosset/

work page 2017

[38] [40]

Ashley Suh, Ab Mosca, Eugene Wu, and Remco Chang. 2022. A grammar of hypotheses for visualization, data, and analysis.arXiv preprint arXiv:2204.14267 (2022)

work page arXiv 2022

[39] [41]

Shyam Sundar

Yuan Sun, Magdalayna Drivas, Mengqi Liao, and S. Shyam Sundar. 2023. When Recommender Systems Snoop into Social Media, Users Trust them Less for Health Advice. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 818, 14 pages. doi:10.1145...

work page doi:10.1145/3544548.3581123 2023

[40] [42]

Emi Tanaka. 2021. Edibble: An R-package to construct designs using the grammar of experimental design. https://github.com/emitanaka/edibble

work page 2021

[41] [43]

Emma Tosch, Eytan Bakshy, Emery D Berger, David D Jensen, and J Eliot B Moss. 2019. Planalyzer: Assessing threats to the validity of online experiments. Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1–30

work page 2019

[42] [44]

Manohar Narhar Vartak. 1955. On an Application of Kronecker Product of Matrices to Statistical Designs.The Annals of Mathematical Statistics26, 3 (1955), ACM UIST ’26, November 2–November 5, 2026, Detroit, Michigan Bielicke et al. 420 – 438. doi:10.1214/aoms/1177728488

work page doi:10.1214/aoms/1177728488 1955

[43] [45]

Steeven Villa, Jasmin Niess, Takuro Nakao, Jonathan Lazar, Albrecht Schmidt, and Tonja-Katrin Machulla. 2023. Understanding Perception of Human Augmentation: A Mixed-Method Study. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 65, 16...

work page arXiv 2023

[44] [46]

1999.The grammar of graphics

Leland Wilkinson. 1999.The grammar of graphics. Springer-Verlag, Berlin, Heidelberg

work page 1999

[45] [47]

Guande Wu, Jing Qian, Sonia Castelo Quispe, Shaoyu Chen, João Rulff, and Claudio T. Silva. 2024. ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)(Honolulu, HI, USA). ACM. doi:10.1145/3613904. 3642772

work page doi:10.1145/3613904 2024

[46] [48]

Tianhong Catherine Yu, Nancy Wang, Sarah Ellenbogen, and Cindy Hsin-Liu Kao

work page

[47] [49]

InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA) (UIST ’23)

Skinergy: Machine-Embroidered Silicone-Textile Composites as On-Skin Self-Powered Input Sensors. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 33, 15 pages. doi:10.1145/3586183.3606729 PLanet: Formalizing and Analyz...

work page doi:10.1145/3586183.3606729 2026

[48] [50]

within_subjects ( emotion_ve )

work page

[49] [51]

counterbalance ( emotion_ve ) 16) 17exercise_design = ( 18Design ()

work page

[50] [52]

within_subjects ( e x er c i se _ i nt e n si t y )

work page

[51] [53]

task order

counterbalance ( e x er c i se _ i nt e n si t y ) 21) 22design = nest ( exercise _design, emotion_design ) 23 24assign ( participants, design ) (a)PLanet (b)edibble Figure 6: Nested design fromSweating the Details: Emotion Recognition and the Influence of Physical Exertion in Virtual Reality Exergaming[ 29] implemented in PLanet and edibble. PLanet corre...

work page 2026

[52] [54]

within_subjects ( tools )

work page

[53] [55]

counterbalance ( tools )

work page

[54] [56]

within_subjects ( tasks )

work page

[55] [57]

counterbalance ( tasks )

work page

[56] [58]

canonical

num_trials (2) 20) 21 22assignment = assign ( participants, design ) 23print ( assignment ) (b) Figure 7: R1’s intended experimental assignment and PLanet program.(a) The original spreadsheet R1 had previously used to manually construct experimental assignments for their study. (b) The PLanet program for R1’s experiment, which produces the same set of ord...

work page 2026