PLanet: Formalizing and Analyzing Assignment Procedures in the Design of Experiments
Pith reviewed 2026-05-22 16:09 UTC · model grok-4.3
The pith
PLanet's grammar and matrix representation for experimental designs enable static analysis to identify testable causal queries under different assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By defining a grammar of composable operators for assignment procedures and representing them in matrix algebra, PLanet compiles designs into constraint satisfaction problems. This allows a static analysis to determine the testability of causal queries under varying assumptions, making design choices and assumptions explicit without requiring full procedural code.
What carries the argument
Composable grammar of operators for assignment procedures, grounded in matrix algebra and compiled to constraint satisfaction problems over matrices.
Load-bearing premise
The matrix algebra representation and constraint satisfaction encoding must capture all relevant assumptions about variable relationships without omitting causal structure or introducing artifacts.
What would settle it
An experimental design in which PLanet's static analysis labels a causal query as testable but a manual causal graph analysis shows it is not identifiable, or the reverse.
Figures
read the original abstract
Experimental designs reflect assumptions about variable relationships that determine what causal queries researchers can answer through the experiment. Accounting for and communicating these assumptions is essential for drawing valid, generalizable conclusions from scientific experiments. Unfortunately, existing experimental design tools elide these details, expecting researchers to reason about design decisions and assumptions on their own. To surface assumptions and enable design exploration, we introduce a grammar of composable operators for constructing experimental assignment procedures grounded in matrix algebra. The PLanet DSL implements this grammar and compiles PLanet programs into constraint satisfaction problems over matrices. Together, PLanet's composable grammar and matrix representation enable a static analysis to determine which causal queries are testable under different assumptions. In an expressivity evaluation, PLanet was the most expressive of existing DSLs. Critical reflections with the authors of these DSLs revealed that PLanet makes design choices explicit without requiring procedural specification. Think-aloud studies showed that PLanet facilitated design exploration and surfaced assumptions researchers may otherwise overlook.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PLanet, a domain-specific language for formalizing experimental assignment procedures via a composable grammar grounded in matrix algebra. PLanet programs are compiled into constraint satisfaction problems over matrices, which in turn support a static analysis that determines which causal queries are testable under different assumptions about variable relationships. The work reports an expressivity evaluation showing PLanet outperforms prior DSLs, critical reflections with authors of those DSLs, and think-aloud studies indicating that the approach surfaces assumptions and supports design exploration.
Significance. If the matrix representation and CSP encoding are shown to be sound and complete for the range of designs claimed, the work could meaningfully advance experimental design practice in HCI and related fields by making implicit causal assumptions explicit and enabling automated testability analysis. The formal grounding in matrix algebra and the combination of expressivity comparison with user studies are positive features; the result would be more impactful if it included falsifiable checks against established causal-identifiability results.
major comments (2)
- [Sections describing the matrix representation and static analysis (likely §3–4)] The central claim that the grammar's matrix representation and compilation to CSPs faithfully encode all relevant assumptions (independence, blocking, randomization, latent variables) so that static analysis correctly decides testability is load-bearing. Matrix algebra naturally represents linear marginals and assignments but risks distorting conditional independencies or non-linear intervention effects; the manuscript should supply either a formal soundness argument or an empirical validation against known causal diagrams for standard designs (e.g., blocked RCTs) to confirm that testability verdicts match results from the causal-inference literature.
- [Evaluation section (expressivity comparison and think-aloud studies)] The expressivity evaluation and think-aloud studies are presented without quantitative details on participant numbers, task protocols, or how the static-analysis outputs were validated against ground-truth causal queries. This absence weakens the claim that PLanet is the most expressive DSL and that it reliably surfaces overlooked assumptions.
minor comments (2)
- [Introduction and grammar definition] Clarify the precise scope of the grammar with respect to non-linear or time-varying assignment procedures; a short limitations paragraph would help readers understand where the matrix encoding may intentionally abstract away structure.
- [Grammar and compilation sections] Add explicit cross-references between the matrix operators and the corresponding causal assumptions they encode (e.g., which operator corresponds to blocking).
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional rigor and clarity will strengthen the manuscript. We address each major comment below and indicate the revisions planned for the next version.
read point-by-point responses
-
Referee: The central claim that the grammar's matrix representation and compilation to CSPs faithfully encode all relevant assumptions (independence, blocking, randomization, latent variables) so that static analysis correctly decides testability is load-bearing. Matrix algebra naturally represents linear marginals and assignments but risks distorting conditional independencies or non-linear intervention effects; the manuscript should supply either a formal soundness argument or an empirical validation against known causal diagrams for standard designs (e.g., blocked RCTs) to confirm that testability verdicts match results from the causal-inference literature.
Authors: We agree that the soundness of the matrix representation and CSP encoding is central to the contribution and that the current manuscript relies primarily on informal arguments and illustrative examples rather than a complete formal proof. We will add a dedicated subsection in the revised version that states a soundness theorem for the linear case, provides a proof sketch based on the correspondence between matrix constraints and d-separation, and includes an empirical validation table comparing PLanet's testability verdicts for blocked RCTs, randomized block designs, and Latin-square designs against established results from the causal-identifiability literature. We will also explicitly delimit the scope to linear models and note that non-linear intervention effects fall outside the current guarantees. revision: yes
-
Referee: The expressivity evaluation and think-aloud studies are presented without quantitative details on participant numbers, task protocols, or how the static-analysis outputs were validated against ground-truth causal queries. This absence weakens the claim that PLanet is the most expressive DSL and that it reliably surfaces overlooked assumptions.
Authors: We accept that the evaluation section would benefit from more precise and quantitative reporting. The current draft mentions the think-aloud studies and expressivity comparison at a summary level but does not include a participant table, full task protocol, or explicit validation procedure against ground-truth queries. In the revision we will expand Section 5 to add (1) a table with exact participant counts, recruitment criteria, and session durations, (2) the complete task protocol and materials, and (3) a new validation subsection that lists the ground-truth causal queries for each evaluated design and reports agreement with PLanet's static-analysis outputs. These additions will make the empirical claims more transparent and reproducible. revision: yes
Circularity Check
No significant circularity in PLanet's formalization and analysis
full rationale
The paper introduces a new composable grammar for experimental assignment procedures, represented via matrix algebra and compiled to constraint satisfaction problems to support static analysis of testable causal queries. No derivation steps, equations, or results in the abstract or context reduce a claimed outcome to its own inputs by construction, self-definition, or fitted parameters renamed as predictions. Expressivity comparisons are made against prior external DSLs, and reflections/user studies provide independent evaluation. No load-bearing self-citations, uniqueness theorems imported from authors, or ansatzes smuggled via citation appear in the material. The contribution is a constructive definition of a DSL and analysis tool rather than a circular derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Experimental assignment procedures and their causal implications can be represented using matrix algebra without loss of critical structure.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a grammar of composable operators for constructing experimental assignment procedures grounded in matrix algebra. The PLanet DSL implements this grammar and compiles PLanet programs into constraint satisfaction problems over matrices.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PLanet checks for positivity of main, interaction, and time-based effects... We implement static analyses to check for time-based confounding and positivity violations.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
American Psychological Association. n.d.. APA Dictionary of Psychology — dictionary.apa.org. https://dictionary.apa.org/. [Accessed 02-04-2025]
work page 2025
-
[2]
Eytan Bakshy, Dean Eckles, and Michael S Bernstein. 2014. Designing and deploy- ing online field experiments. InProceedings of the 23rd international conference on World wide web. ACM, 283–292
work page 2014
-
[3]
Alan F Blackwell, Carol Britton, A Cox, Thomas RG Green, Corin Gurr, Gada Kadoda, MS Kutar, Martin Loomes, Chrystopher L Nehaniv, Marian Petre, et al
-
[4]
InInternational Conference on Cognitive Technology
Cognitive dimensions of notations: Design tools for cognitive technology. InInternational Conference on Cognitive Technology. Springer, 325–341
-
[5]
Graeme Blair, Jasper Cooper, Alexander Coppock, and Macartan Humphreys
-
[6]
Declaring and diagnosing research designs.American Political Science Review113, 3 (2019), 838–859
work page 2019
-
[7]
Donald T. Campbell and Julian C. Stanley. 1963.Experimental and Quasi- Experimental Designs for Research. Houghton Mifflin, Boston
work page 1963
-
[8]
Liwei Chan, Tzu Wei Mi, Zung Hao Hsueh, Yi Ci Huang, and Ming Yun Hsu
-
[10]
Gary Charness, Uri Gneezy, and Michael A. Kuhn. 2012. Experimental methods: Between-subject and within-subject design.Journal of Economic Behavior & Organization81, 1 (2012), 1–8. doi:10.1016/j.jebo.2011.08.009
-
[11]
Thomas D. Cook and Donald T. Campbell. 1979.Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin, Boston
work page 1979
-
[12]
2000.The Theory of the Design of Experiments
David Roxbee Cox and Nancy Reid. 2000.The Theory of the Design of Experiments. CRC Press
work page 2000
-
[13]
Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes. 2023. Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany)(CHI ’23). Association for ...
-
[14]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340
work page 2008
-
[15]
Smit Desai and Jessie Chin. 2023. OK Google, Let’s Learn: Using Voice User Interfaces for Informal Self-Regulated Learning of Health Topics among Younger and Older Adults. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). ACM, 1–21. doi:10.1145/3544548.3581507
-
[16]
2018.Advanced Control Design with Application to Electromechanical Systems
Jean-Marie Dufour and Patrick Frenkiel. 2018.Advanced Control Design with Application to Electromechanical Systems. Springer, Cham, Switzerland
work page 2018
-
[17]
Alexander Eiselmayer, Chatchavan Wacharamanotham, Michel Beaudouin-Lafon, and Wendy Mackay. 2019. Touchstone2: An Interactive Environment for Explor- ing Trade-offs in HCI Experiment Design. (2019)
work page 2019
-
[18]
Ronald A. Fisher. 1935.The Design of Experiments. Oliver and Boyd, Edinburgh
work page 1935
-
[19]
Takao Fujii, Katie Seaborn, and Madeleine Steeds. 2024. Silver-Tongued and Sundry: Exploring Intersectional Pronouns with ChatGPT. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 511, 14 pages. doi:10.1145/3613904.3642303
-
[20]
Miguel A. Hernan and James M. Robins. 2025.Causal Inference: What If. CRC Press, Boca Raton
work page 2025
-
[21]
Arata Jingu, Nihar Sabnis, Paul Strohmeier, and Jürgen Steimle. 2024. Shaping Compliance: Inducing Haptic Illusion of Compliance in Different Shapes with Electrotactile Grains. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3641907
-
[22]
Eunice Jun, Maureen Daum, Jared Roesch, Sarah E Chasins, Emery D Berger, Rene Just, and Katharina Reinecke. 2019. Tea: A High-level Language and Runtime System for Automating Statistical Analysis. InProceedings of the 32nd Annual Symposium on User Interface Software and Technology. ACM
work page 2019
-
[23]
Eunice Jun, Edward Misback, Jeffrey Heer, and René Just. 2024. rTisane: Exter- nalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling. InProceedings of the CHI Confer- ence on Human Factors in Computing Systems. 1–16
work page 2024
-
[24]
Eunice Jun, Audrey Seo, Jeffrey Heer, and René Just. 2022. Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16
work page 2022
-
[25]
Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experi- mental design for developing just-in-time adaptive interventions.Health Psychol. 34S, Suppl (Dec. 2015), 1220–1228
work page 2015
-
[26]
Susan Lin, Jeremy Warner, J.D. Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Shanqing Cai, Piyawat Lertvittayakumjorn, Michael Xuelin Huang, Shumin Zhai, Bjoern Hartmann, and Can Liu. 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu,...
-
[27]
Wendy E Mackay, Caroline Appert, Michel Beaudouin-Lafon, Olivier Chapuis, Yangzhou Du, Jean-Daniel Fekete, and Yves Guiard. 2007. Touchstone: exploratory design of experiments. InProceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1425–1434
work page 2007
-
[28]
Sebastian Musslick, Anastasia Cherkaev, Ben Draut, Ahsan Sajjad Butt, Pierce Darragh, Vivek Srikumar, Matthew Flatt, and Jonathan D. Cohen. 2022. SweetPea: A standard language for factorial experimental design.Behavior Research Methods 54 (2022), 805–829. doi:10.3758/s13428-021-01598-2
-
[29]
Jakob Nielsen. 1994.Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
work page 1994
-
[30]
N.L.J. 1959. Planning of Experiments. By D. R. Cox. [Pp. vi+308. New York: John Wiley and Sons, Inc. London: Chapman and Hall, Ltd., 1958. 60s.].Journal of the Institute of Actuaries85, 2 (1959), 317–319. doi:10.1017/S0020268100038063
-
[31]
2009.Causality: Models, Reasoning, and Inference(2nd ed.)
Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cambridge University Press
work page 2009
-
[33]
Arvind Satyanarayan, Bongshin Lee, Donghao Ren, Jeffrey Heer, John Stasko, John Thompson, Matthew Brehmer, and Zhicheng Liu. 2020. Critical Reflections on Visualization Authoring Systems.IEEE Transactions on Visualization and Computer Graphics26, 1 (2020), 461–471. doi:10.1109/TVCG.2019.2934281
-
[34]
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer
-
[35]
Vega-lite: A grammar of interactive graphics.IEEE transactions on visual- ization and computer graphics23, 1 (2017), 341–350
work page 2017
-
[36]
Howard J. Seltman. 2018.Experimental Design and Analysis. Carnegie Mellon University. https://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf
work page 2018
-
[37]
2002.Experi- mental and quasi-experimental designs for generalized causal inference
William Shadish, Thomas D Cook, and Donald Thomas Campbell. 2002.Experi- mental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Boston, MA
work page 2002
-
[38]
Xiyuan Shen, Chun Yu, Xutong Wang, Chen Liang, Haozhan Chen, and Yuanchun Shi. 2024. MouseRing: Always-available Touchpad Interaction with IMU Rings. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. ACM, 1–13. doi:10.1145/3613904.3642225
-
[39]
N.J.A. Sloane and R.H. Hardin. 2017. Gosset: A General-purpose program for designing experiments. http://neilsloane.com/gosset/
work page 2017
- [40]
-
[41]
Yuan Sun, Magdalayna Drivas, Mengqi Liao, and S. Shyam Sundar. 2023. When Recommender Systems Snoop into Social Media, Users Trust them Less for Health Advice. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 818, 14 pages. doi:10.1145...
-
[42]
Emi Tanaka. 2021. Edibble: An R-package to construct designs using the grammar of experimental design. https://github.com/emitanaka/edibble
work page 2021
-
[43]
Emma Tosch, Eytan Bakshy, Emery D Berger, David D Jensen, and J Eliot B Moss. 2019. Planalyzer: Assessing threats to the validity of online experiments. Proceedings of the ACM on Programming Languages3, OOPSLA (2019), 1–30
work page 2019
-
[44]
Manohar Narhar Vartak. 1955. On an Application of Kronecker Product of Matrices to Statistical Designs.The Annals of Mathematical Statistics26, 3 (1955), ACM UIST ’26, November 2–November 5, 2026, Detroit, Michigan Bielicke et al. 420 – 438. doi:10.1214/aoms/1177728488
-
[45]
Steeven Villa, Jasmin Niess, Takuro Nakao, Jonathan Lazar, Albrecht Schmidt, and Tonja-Katrin Machulla. 2023. Understanding Perception of Human Augmentation: A Mixed-Method Study. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 65, 16...
-
[46]
Leland Wilkinson. 1999.The grammar of graphics. Springer-Verlag, Berlin, Heidelberg
work page 1999
-
[47]
Guande Wu, Jing Qian, Sonia Castelo Quispe, Shaoyu Chen, João Rulff, and Claudio T. Silva. 2024. ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)(Honolulu, HI, USA). ACM. doi:10.1145/3613904. 3642772
-
[48]
Tianhong Catherine Yu, Nancy Wang, Sarah Ellenbogen, and Cindy Hsin-Liu Kao
-
[49]
Skinergy: Machine-Embroidered Silicone-Textile Composites as On-Skin Self-Powered Input Sensors. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 33, 15 pages. doi:10.1145/3586183.3606729 PLanet: Formalizing and Analyz...
-
[50]
within_subjects ( emotion_ve )
-
[51]
counterbalance ( emotion_ve ) 16) 17exercise_design = ( 18Design ()
-
[52]
within_subjects ( e x er c i se _ i nt e n si t y )
-
[53]
counterbalance ( e x er c i se _ i nt e n si t y ) 21) 22design = nest ( exercise _design, emotion_design ) 23 24assign ( participants, design ) (a)PLanet (b)edibble Figure 6: Nested design fromSweating the Details: Emotion Recognition and the Influence of Physical Exertion in Virtual Reality Exergaming[ 29] implemented in PLanet and edibble. PLanet corre...
work page 2026
-
[54]
within_subjects ( tools )
-
[55]
counterbalance ( tools )
-
[56]
within_subjects ( tasks )
-
[57]
counterbalance ( tasks )
-
[58]
num_trials (2) 20) 21 22assignment = assign ( participants, design ) 23print ( assignment ) (b) Figure 7: R1’s intended experimental assignment and PLanet program.(a) The original spreadsheet R1 had previously used to manually construct experimental assignments for their study. (b) The PLanet program for R1’s experiment, which produces the same set of ord...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.