TCARD: Nearly Balanced Two-Level Designs with Treatment Cardinality Constraints with an Application to LLM Prompt Engineering
Pith reviewed 2026-05-21 03:11 UTC · model grok-4.3
The pith
Nearly balanced two-level designs under treatment cardinality constraints minimize the first two components of the generalized word-length pattern.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Nearly balanced TCARDs, which have constant row sums of k in the design matrix, achieve the minimal values for the initial two components in the generalized word-length pattern. Projection quality depends on each factor appearing equally often across treatments and every pair of factors appearing together the same number of times. The Φ_BCD objective is introduced to minimize deviations from these regularities and is shown to relate to M,S-optimality, centered UE(s^2), and Bayesian D-optimality.
What carries the argument
The Φ_BCD criterion, which penalizes replication imbalance and concurrence dispersion to produce nearly balanced designs.
Load-bearing premise
That good projection behavior is determined by balanced factor replications and uniform pairwise concurrences.
What would settle it
Constructing a TCARD for specific n, p, k values that has unbalanced replications or non-uniform concurrences yet still achieves lower values in the first two generalized word-length pattern components than the nearly balanced ones.
Figures
read the original abstract
Modern experimental designs often face the so-called treatment cardinality constraint, which is the constraint on the number of included factors in each treatment. Experiments with such constraints are commonly encountered in engineering simulation, AI system tuning, and large-scale system verification. This calls for the development of adequate designs to enable statistical efficiency for modeling and analysis within feasible constraints. In this work, we study two-level designs under this $k$-treatment cardinality constraint (TCARD), where the design matrix $\mathbf{X} \in \{0,1\}^{n \times p}$ has constant row sums equal to $k$. Although TCARDs are closely related to balanced incomplete block designs (BIBDs), exact BIBD structure is unavailable for many practical $(n,p,k)$ combinations. This leads to the notion of nearly balanced TCARDs, which we prove minimize the first two components of the generalized word-length pattern. We also show that good projection behavior in this setting is governed by two count-based regularities: balanced factor replications and uniform pairwise concurrences. Motivated by this characterization, we then propose the Balanced Concurrence Deviation ($\Phi_{\mathrm{BCD}}$), a model-free objective that jointly penalizes replication imbalance and concurrence dispersion. We further show that this criterion is closely connected to classical optimality principles, including $(M,S)$-optimality, centered $\mathrm{UE}(s^2)$ criterion, and Bayesian $D$-optimality. To construct designs minimizing $\Phi_{\mathrm{BCD}}$, we develop a coordinate-exchange (CE) algorithm with efficient incremental updates, together with a simulation-based procedure for calibrating the criterion weights to the intended downstream task. Numerical experiments confirm that the proposed method compares favorably with existing alternatives across a range of problem sizes and constraint strengths.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces nearly balanced two-level designs under treatment cardinality constraints (TCARDs), where the n × p design matrix X has entries in {0,1} and constant row sums equal to k. It proves that such nearly balanced TCARDs minimize the first two components of the generalized word-length pattern. The authors characterize desirable projection properties via two count-based regularities—balanced factor replications and uniform pairwise concurrences—and propose the Balanced Concurrence Deviation criterion Φ_BCD that jointly penalizes replication imbalance and concurrence dispersion. Connections are established to (M,S)-optimality, centered UE(s²), and Bayesian D-optimality. A coordinate-exchange algorithm with incremental updates is developed to minimize Φ_BCD, together with a simulation-based weight calibration procedure. Numerical experiments and an application to LLM prompt engineering are presented.
Significance. If the central proofs and characterizations hold, the work supplies a practical, model-free route to efficient designs for experiments subject to cardinality constraints that arise in simulation, system verification, and AI tuning. The explicit links to classical optimality criteria and the efficient construction algorithm constitute clear strengths; the LLM application illustrates relevance beyond traditional DOE settings.
major comments (2)
- [Abstract and §2] Abstract and §2: The claim that good projection behavior is governed by balanced factor replications and uniform pairwise concurrences is used to motivate Φ_BCD. With constant row sums equal to k, however, factors are coupled within each row; this global constraint can induce higher-order dependencies in projections onto factor subsets that pairwise concurrence counts alone may not control. An explicit argument or small-scale counter-example showing that the two regularities remain sufficient under the cardinality constraint would be needed to secure the connection to optimality principles.
- [§3, Theorem 1] §3, Theorem 1: The proof that nearly balanced TCARDs minimize the first two generalized word-length pattern components is load-bearing for the subsequent development. The derivation should be checked to confirm that it fully incorporates the row-sum constraint rather than treating the counts as independent; any implicit assumption that higher-order terms vanish under the two regularities needs explicit statement.
minor comments (2)
- [§4] Notation: The precise definition of the weights inside Φ_BCD and the simulation-based calibration procedure would benefit from a dedicated algorithmic box or pseudocode to improve reproducibility.
- [§5] Figures 2–4: The boxplots comparing Φ_BCD against baselines would be clearer if the number of independent replications and the exact performance metric (e.g., average projection variance) were stated in the captions.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The two major comments raise important points about the motivation for our criterion and the details of the central proof. We address each below and will revise the manuscript accordingly to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract and §2] The claim that good projection behavior is governed by balanced factor replications and uniform pairwise concurrences is used to motivate Φ_BCD. With constant row sums equal to k, however, factors are coupled within each row; this global constraint can induce higher-order dependencies in projections onto factor subsets that pairwise concurrence counts alone may not control. An explicit argument or small-scale counter-example showing that the two regularities remain sufficient under the cardinality constraint would be needed to secure the connection to optimality principles.
Authors: We agree that the row-sum constraint introduces coupling among factors and that an explicit verification of sufficiency is warranted. In the revised manuscript we will add a short subsection in §2 containing a small-scale numerical example (n=6, p=4, k=2) that compares designs satisfying the two regularities against alternatives that violate them. The example will demonstrate that, under the constant row-sum constraint, the first two generalized word-length pattern components (and the associated projection discrepancies) are indeed minimized precisely when the replication and concurrence regularities hold, with no residual higher-order effects appearing in the low-order projections relevant to our optimality arguments. revision: yes
-
Referee: [§3, Theorem 1] The proof that nearly balanced TCARDs minimize the first two generalized word-length pattern components is load-bearing for the subsequent development. The derivation should be checked to confirm that it fully incorporates the row-sum constraint rather than treating the counts as independent; any implicit assumption that higher-order terms vanish under the two regularities needs explicit statement.
Authors: We have re-examined the proof of Theorem 1. The derivation begins from the definition of the generalized word-length pattern for a 0-1 matrix with fixed row sums equal to k and expresses the relevant inner products directly in terms of the constrained row totals; the counts are therefore not treated as independent. To make this transparent, the revised proof will include an additional paragraph that (i) explicitly substitutes the row-sum constraint into the expressions for the first two word-length components and (ii) states that, once the nearly-balanced conditions are imposed, all higher-order contributions to these components are identically zero by the algebraic identity used in the proof. We believe these clarifications will fully address the concern while leaving the result unchanged. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces nearly balanced TCARDs after noting the unavailability of exact BIBDs for many (n,p,k) tuples, then proves these minimize the first two GWP components and shows projection behavior is governed by balanced replications plus uniform pairwise concurrences. These are standard count-based characterizations in design theory; the subsequent definition of Φ_BCD as a penalty on imbalance and dispersion is motivated by but not identical to those counts, and the paper separately connects Φ_BCD to (M,S)-optimality, centered UE(s²), and Bayesian D-optimality via explicit arguments rather than by re-labeling. The simulation-based weight calibration is a practical tuning step for the LLM application and does not retroactively define the optimality claims. No quoted step equates a claimed result to its own inputs by construction, and no load-bearing premise collapses to a self-citation whose content is unverified outside the present work. The derivation therefore remains self-contained against external design-theoretic benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- weights in Φ_BCD
axioms (1)
- domain assumption TCARDs are closely related to balanced incomplete block designs (BIBDs)
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We also show that good projection behavior in this setting is governed by two count-based regularities: balanced factor replications and uniform pairwise concurrences. ... propose the Balanced Concurrence Deviation (Φ_BCD), a model-free objective that jointly penalizes replication imbalance and concurrence dispersion.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
nearly balanced TCARDs, which we prove minimize the first two components of the generalized word-length pattern. ... B1 is minimized if and only if NB1 ... B2 is minimized if and only if NB2
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Minimum moment aberration for nonregular designs and supersaturated designs , author=. Statistica Sinica , pages=. 2003 , publisher=
work page 2003
-
[2]
Journal of statistical planning and inference , volume=
Exploratory designs for computational experiments , author=. Journal of statistical planning and inference , volume=. 1995 , publisher=
work page 1995
-
[3]
Journal of statistical planning and inference , volume=
Minimax and maximin distance designs , author=. Journal of statistical planning and inference , volume=. 1990 , publisher=
work page 1990
-
[4]
Projective properties of certain orthogonal arrays , author=. Biometrika , volume=. 1996 , publisher=
work page 1996
-
[5]
A minimum aberration-type criterion for selecting space-filling designs , author=. Biometrika , volume=. 2022 , publisher=
work page 2022
-
[6]
Theory of J-characteristics for fractional factorial designs and projection justification of minimum G 2-aberration , author=. Biometrika , volume=. 2001 , publisher=
work page 2001
-
[7]
Some systematic supersaturated designs , author=. Technometrics , volume=. 1962 , publisher=
work page 1962
-
[8]
Box, G. E. P. and Hunter, J. S. , title =. Technometrics , volume =
-
[9]
Box, G. E. P. and Meyer, R. D. , title =. Technometrics , volume =
-
[10]
the Annals of Statistics , volume=
Minimum G\_2 -aberration for nonregular fractional factorial designs , author=. the Annals of Statistics , volume=. 1999 , publisher=
work page 1999
-
[11]
Generalized resolution and minimum aberration criteria for Plackett-Burman and other nonregular factorial designs , author=. Statistica Sinica , pages=. 1999 , publisher=
work page 1999
-
[12]
Journal of Statistical Planning and Inference , volume=
Construction of component orthogonal arrays with any number of components , author=. Journal of Statistical Planning and Inference , volume=. 2021 , publisher=
work page 2021
-
[13]
A new method of finding component orthogonal arrays for order-of-addition experiments , author=. Metrika , volume=. 2021 , publisher=
work page 2021
-
[14]
Quality and Reliability Engineering International , volume=
A general construction method for component orthogonal arrays , author=. Quality and Reliability Engineering International , volume=. 2024 , publisher=
work page 2024
-
[15]
Combinatorial designs: constructions and analysis , author=. ACM SIGACT News , volume=. 2008 , publisher=
work page 2008
-
[16]
Iterative construction of nearly balanced assignments I: categorical covariates , author=. Technometrics , volume=. 1981 , publisher=
work page 1981
-
[17]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
Optimum experimental designs , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1959 , publisher=
work page 1959
-
[18]
The annals of statistics , pages=
On the theory of connected designs: characterization and optimality , author=. The annals of statistics , pages=. 1974 , publisher=
work page 1974
-
[19]
Jones, B. and Nachtsheim, C. J. , title =. Journal of Quality Technology , volume =
-
[20]
The Annals of Mathematical Statistics , pages=
Optimality criteria for incomplete block designs , author=. The Annals of Mathematical Statistics , pages=. 1960 , publisher=
work page 1960
-
[21]
The Annals of Mathematical Statistics , volume=
On the nonrandomized optimality and randomized nonoptimality of symmetrical designs , author=. The Annals of Mathematical Statistics , volume=. 1958 , publisher=
work page 1958
-
[22]
Nearly balanced incomplete block designs , author=. Biometrika , volume=. 1981 , publisher=
work page 1981
-
[23]
Large row-constrained supersaturated designs for high-throughput screening , author=. Biometrics , volume=. 2025 , publisher=
work page 2025
-
[24]
Journal of the American Statistical Association , volume=
Optimal supersaturated designs , author=. Journal of the American Statistical Association , volume=. 2014 , publisher=
work page 2014
-
[25]
Journal of Statistical Planning and Inference , volume=
Some sufficient conditions for establishing (M, S)-optimality , author=. Journal of Statistical Planning and Inference , volume=. 1980 , publisher=
work page 1980
-
[26]
Plackett, R. L. and Burman, J. P. , title =. Biometrika , volume =
-
[27]
The coordinate-exchange algorithm for constructing exact optimal experimental designs , author=. Technometrics , volume=. 1995 , publisher=
work page 1995
-
[28]
Wu, C. F. J. , title =. Biometrika , volume =
-
[29]
American Journal of Physiology-Regulatory, Integrative and Comparative Physiology , volume=
Multiple-objective criteria for optimal experimental design: application to ferrokinetics , author=. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology , volume=. 1985 , publisher=
work page 1985
-
[30]
Wiley Interdisciplinary Reviews: Computational Statistics , volume=
Optimal experimental design that targets meaningful information , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2017 , publisher=
work page 2017
-
[31]
Journal of the Royal Statistical Society Series C: Applied Statistics , volume=
Optimum design of experiments for statistical inference , author=. Journal of the Royal Statistical Society Series C: Applied Statistics , volume=. 2012 , publisher=
work page 2012
-
[32]
Journal of Biopharmaceutical Statistics , volume=
Compound optimal design criteria for nonlinear models , author=. Journal of Biopharmaceutical Statistics , volume=. 2008 , publisher=
work page 2008
-
[33]
Journal of the American Statistical Association , volume=
On the equivalence of constrained and compound optimal designs , author=. Journal of the American Statistical Association , volume=. 1994 , publisher=
work page 1994
-
[34]
Optimization of designed experiments based on multiple criteria utilizing a Pareto frontier , author=. Technometrics , volume=. 2011 , publisher=
work page 2011
-
[35]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
An optimal design framework for lasso sign recovery , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=
work page 2025
-
[36]
Sensor selection strategies for state estimation in energy constrained wireless sensor networks , author=. Automatica , volume=. 2011 , publisher=
work page 2011
-
[37]
Optimization letters , volume=
Optimization scheme for sensor coverage scheduling with bandwidth constraints , author=. Optimization letters , volume=. 2009 , publisher=
work page 2009
-
[38]
Current opinion in drug discovery & development , volume=
Pooling in high-throughput drug screening , author=. Current opinion in drug discovery & development , volume=
-
[39]
On the interaction of feature toggles , author=. Proceedings of the 16th International Working Conference on Variability Modelling of Software-Intensive Systems , pages=
-
[40]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models. arXiv 2023 , author=. arXiv preprint arXiv:2302.13971 , volume=
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Training Verifiers to Solve Math Word Problems
Training verifiers to solve math word problems, 2021 , author=. URL https://arxiv. org/abs/2110.14168 , volume=
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[42]
A simple Bayesian modification of D-optimal designs to reduce dependence on an assumed model , author=. Technometrics , volume=. 1994 , publisher=
work page 1994
-
[43]
Journal of Statistical Planning and Inference , volume=
Bayesian D-optimal supersaturated designs , author=. Journal of Statistical Planning and Inference , volume=. 2008 , publisher=
work page 2008
-
[44]
Advances in Neural Information Processing Systems , volume=
Language models are few-shot learners , author=. Advances in Neural Information Processing Systems , volume=
-
[45]
Advances in Neural Information Processing Systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[46]
Advances in Neural Information Processing Systems , volume=
Large language models are zero-shot reasoners , author=. Advances in Neural Information Processing Systems , volume=
-
[47]
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
A systematic survey of prompt engineering in large language models: Techniques and applications , author=. arXiv preprint arXiv:2402.07927 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
Large language models are better reasoners with self-verification , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
work page 2023
-
[49]
Advances in Neural Information Processing Systems , volume=
Self-refine: Iterative refinement with self-feedback , author=. Advances in Neural Information Processing Systems , volume=
-
[50]
Findings of the Association for Computational Linguistics: EMNLP 2024 , year=
When ``A Helpful Assistant'' is not really helpful: Personas in system prompts do not improve performances of large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , year=
work page 2024
-
[51]
International Conference on Learning Representations (ICLR) , year=
Least-to-most prompting enables complex reasoning in large language models , author=. International Conference on Learning Representations (ICLR) , year=
-
[52]
The Llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
arXiv preprint arXiv:2311.04205 , year=
Rephrase and respond: Let large language models ask better questions for themselves , author=. arXiv preprint arXiv:2311.04205 , year=
-
[54]
International Conference on Learning Representations (ICLR) , year=
Chain-of-table: Evolving tables in the reasoning chain for table understanding , author=. International Conference on Learning Representations (ICLR) , year=
-
[55]
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Show your work: Scratchpads for intermediate computation with language models , author=. arXiv preprint arXiv:2112.00114 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks , author=. arXiv preprint arXiv:2211.12588 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
arXiv preprint arXiv:2502.18600 , year=
Chain of draft: Thinking faster by writing less , author=. arXiv preprint arXiv:2502.18600 , year=
-
[58]
arXiv preprint arXiv:2311.08734 , year=
Thread of thought unraveling chaotic contexts , author=. arXiv preprint arXiv:2311.08734 , year=
-
[59]
arXiv preprint arXiv:2410.21333 , year=
Mind your step (by step): Chain-of-thought can reduce performance on tasks where thinking makes humans worse , author=. arXiv preprint arXiv:2410.21333 , year=
-
[60]
IEEE Transactions on Software Engineering , year=
The impact of prompt programming on function-level code generation , author=. IEEE Transactions on Software Engineering , year=
-
[61]
Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=
Do language models understand measurements? , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=
work page 2022
-
[62]
Question-analysis prompting improves LLM performance in reasoning tasks , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) , pages=
-
[63]
Large language models are unconscious of unreasonability in math problems , author=. arXiv e-prints , pages=
-
[64]
Proceedings of the 40th International Conference on Machine Learning , series=
Large Language Models Can Be Easily Distracted by Irrelevant Context , author=. Proceedings of the 40th International Conference on Machine Learning , series=. 2023 , publisher=
work page 2023
-
[65]
arXiv preprint arXiv:2402.14848 , year=
Same Task, More Tokens: The Impact of Input Length on the Reasoning Performance of Large Language Models , author=. arXiv preprint arXiv:2402.14848 , year=
-
[66]
Curse of instructions: Large language models cannot follow multiple instructions at once , author=
-
[67]
Sprague, Zayne and Yin, Fangcong and Rodriguez, Juan Diego and Jiang, Dongwei and Wadhwa, Manya and Singhal, Prasann and Zhao, Xinyu and Ye, Xi and Mahowald, Kyle and Durrett, Greg , booktitle=. To. 2025 , note=
work page 2025
-
[68]
arXiv preprint arXiv:2506.14641 , year=
Revisiting chain-of-thought prompting: Zero-shot can be stronger than few-shot , author=. arXiv preprint arXiv:2506.14641 , year=
-
[69]
Transactions of the Association for Computational Linguistics , volume=
Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , volume=
-
[70]
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[71]
Proceedings of the 40th International Conference on Machine Learning , series=
Large Language Models Can Be Easily Distracted by Irrelevant Context , author=. Proceedings of the 40th International Conference on Machine Learning , series=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.