Recognition: 2 theorem links
· Lean TheoremPACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
Pith reviewed 2026-05-15 13:07 UTC · model grok-4.3
The pith
Weighting distillation problems by student pass rate p(1-p) focuses training on the competence frontier and improves math benchmark scores by up to 8.2 points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PACED weights each training problem by w(p) = p(1-p) where p is the student's empirical pass rate, thereby concentrating gradient updates on problems that sit at the frontier of current competence. The authors prove that the Beta kernel family w(p) = p^α(1-p)^β is the leading-order optimal weighting under the observed SNR boundary-collapse structure and remains minimax-robust to misspecification. Across Qwen3, Qwen2.5, and Llama-3 families the method sets new state-of-the-art numbers on MATH-500, AIME 2024, and AIME 2025, delivering gains of up to +8.2 over unweighted distillation and +3.6 over the AKL baseline while reducing forgetting to 1.4 percent and 0.6 percent.
What carries the argument
The Beta kernel weighting w(p) = p(1-p), which multiplies each problem's loss by the product of its pass rate and failure rate to emphasize intermediate competence.
If this is right
- New state-of-the-art results on MATH-500, AIME 2024, and AIME 2025 across Qwen and Llama model families.
- Forgetting drops to 1.4 percent during distillation and 0.6 percent during self-distillation.
- A two-stage forward-then-reverse KL schedule adds up to +5.8 points on the hardest benchmarks over standard forward KL.
- Gains of up to +3.6 over the strong AKL baseline are obtained with no architectural changes.
- The method uses only student rollouts and requires no extra hyperparameters.
Where Pith is reading between the lines
- The same weighting could be applied to non-math reasoning domains where problem difficulty varies across a student's competence range.
- Combining PACED with other curriculum or difficulty-scheduling techniques may produce further efficiency gains.
- The bell-curve SNR pattern may appear in reinforcement learning from human feedback or other on-policy training settings.
- Testing the approach on models larger than those in the current experiments would check whether the Beta kernel remains near-optimal at scale.
Load-bearing premise
The cross-problem gradient signal-to-noise ratio follows a bell curve over student pass rate, collapsing at both extremes.
What would settle it
Measuring the gradient SNR across problems on a fresh math benchmark and checking whether the bell-curve shape appears; if it does not, the optimality derivation for the Beta kernel would not hold.
Figures
read the original abstract
Standard LLM distillation treats all training problems equally -- wasting compute on problems the student has already mastered or cannot yet solve. We empirically show that this inefficiency has a precise gradient-level signature: the cross-problem gradient signal-to-noise ratio (SNR) follows a bell curve over student pass rate, collapsing at both extremes. We propose PACED, which weights each problem by $w(p) = p(1{-}p)$ where $p$ is the student's empirical pass rate -- concentrating training on the zone of proximal development. This requires only student rollouts, no architectural changes, and no hyperparameters. We prove the Beta kernel $w(p) = p^\alpha(1{-}p)^\beta$ is the leading-order optimal weight family arising from the SNR boundary-collapse structure, and is minimax-robust under misspecification (worst-case efficiency loss $O(\delta^2)$). Across Qwen3, Qwen2.5, and Llama-3 families, PACED sets a new state of the art in our experimental setting on MATH-500, AIME~2024, and AIME~2025, improving over unweighted distillation by up to $\mathbf{+8.2}$ and over the strong AKL baseline by up to $\mathbf{+3.6}$, while reducing forgetting to $\mathbf{1.4\%}$ and $\mathbf{0.6\%}$ in distillation and self-distillation. A two-stage forward-then-reverse KL schedule pushes gains further to $\mathbf{+5.8}$ over standard forward KL on the hardest benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PACED, a weighting scheme for LLM distillation and self-distillation that assigns weights w(p) = p(1-p) to training problems based on the student's empirical pass rate p. This is motivated by an empirical observation that the gradient signal-to-noise ratio (SNR) across problems follows a bell curve over p, collapsing at low and high pass rates. The authors claim to prove that the Beta kernel family is the leading-order optimal weighting arising from this SNR structure and is minimax-robust to misspecification with O(δ²) loss. Empirically, PACED achieves new state-of-the-art results on MATH-500, AIME 2024, and AIME 2025 across Qwen3, Qwen2.5, and Llama-3 model families, with improvements of up to +8.2 over unweighted distillation and +3.6 over the AKL baseline, while reducing forgetting to 1.4% and 0.6%.
Significance. If the SNR-based optimality holds, PACED offers a simple, hyperparameter-free improvement to distillation efficiency by focusing compute on the zone of proximal development. The reported gains across multiple model families and the reduction in forgetting are substantial and would represent a meaningful advance for LLM training pipelines if reproducible. The minimax-robustness claim adds theoretical value if the underlying SNR functional form is validated.
major comments (3)
- [§3] §3 (SNR optimality derivation): the claim that the observed bell-curve SNR structure implies the Beta kernel w(p)=p(1-p) is leading-order optimal requires an explicit step-by-step derivation showing how SNR(p) ∝ p(1-p) produces this weighting; if the empirical SNR is asymmetric or contains higher-order terms once pass-rate estimator variance is included, the specific functional form loses its justification.
- [§4] §4 (Empirical SNR analysis): the central assumption that SNR follows a symmetric bell curve peaking near p=0.5 with collapse at extremes must be supported by quantitative plots, statistics, and error bars that account for finite-sample variance in the pass-rate estimator p; without this, the optimality argument rests on an unverified functional form.
- [Results] Results section (Tables 1-3): the attribution of the +3.6 gain over AKL and +8.2 over unweighted to the theoretical construction (rather than incidental concentration effects) needs an ablation comparing the Beta kernel to other weightings that achieve similar focus but deviate from the SNR-derived form.
minor comments (3)
- [Abstract] Abstract: the phrase 'in our experimental setting' when claiming SOTA should be replaced with a precise description of the training data, rollout budget, and evaluation protocol.
- [Methods] Notation: the definition of empirical pass rate p (including number of rollouts per problem and handling of ties) should be stated explicitly in the methods section.
- [§5] §5 (two-stage KL schedule): details on how the forward-then-reverse KL interacts with the PACED weighting function are needed to understand whether the additional +5.8 gain is independent of the Beta kernel.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback. We appreciate the opportunity to clarify the theoretical foundations and strengthen the empirical validation of PACED. Below we address each major comment point by point, indicating the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (SNR optimality derivation): the claim that the observed bell-curve SNR structure implies the Beta kernel w(p)=p(1-p) is leading-order optimal requires an explicit step-by-step derivation showing how SNR(p) ∝ p(1-p) produces this weighting; if the empirical SNR is asymmetric or contains higher-order terms once pass-rate estimator variance is included, the specific functional form loses its justification.
Authors: We will add an explicit step-by-step derivation in §3 showing how the SNR(p) ∝ p(1-p) structure leads to the leading-order optimality of the Beta kernel w(p)=p(1-p). The derivation starts from the gradient variance and signal terms, derives the optimal weighting as proportional to SNR(p), and shows that under the boundary collapse assumption, the Beta(2,2) kernel (i.e., p(1-p)) emerges as the minimax-robust choice. Regarding potential asymmetry or higher-order terms, we will include a discussion noting that the leading-order approximation holds even with moderate asymmetry, with the robustness bound O(δ²) covering misspecification. We will also add a note on the pass-rate estimator variance and its impact. revision: yes
-
Referee: [§4] §4 (Empirical SNR analysis): the central assumption that SNR follows a symmetric bell curve peaking near p=0.5 with collapse at extremes must be supported by quantitative plots, statistics, and error bars that account for finite-sample variance in the pass-rate estimator p; without this, the optimality argument rests on an unverified functional form.
Authors: We agree that the empirical validation needs strengthening. In the revised manuscript, we will include quantitative plots of SNR vs. p with error bars computed via bootstrap resampling to account for finite-sample variance in the pass-rate estimator. We will report statistics such as the peak location, symmetry measures, and goodness-of-fit to the bell curve. This will be added to §4, confirming the symmetric collapse at extremes. revision: yes
-
Referee: [Results] Results section (Tables 1-3): the attribution of the +3.6 gain over AKL and +8.2 over unweighted to the theoretical construction (rather than incidental concentration effects) needs an ablation comparing the Beta kernel to other weightings that achieve similar focus but deviate from the SNR-derived form.
Authors: To address this, we will add an ablation study in the results section comparing the Beta kernel w(p)=p(1-p) to other concentration-based weightings, such as a Gaussian kernel centered at p=0.5 with similar variance, and a uniform weighting over a focused interval [0.2,0.8]. This will demonstrate that the specific SNR-derived form provides additional gains beyond mere concentration, supporting the theoretical attribution. The ablation will be included in Tables 1-3 or as a new table. revision: yes
Circularity Check
No circularity: SNR observation independently motivates Beta weighting whose optimality is proved from that structure, with gains reported as separate empirical results
full rationale
The paper first reports an empirical observation that cross-problem gradient SNR follows a bell curve over student pass rate p, collapsing at extremes. It then defines w(p)=p(1-p) and proves the Beta family is leading-order optimal for any such boundary-collapse structure (with O(δ²) minimax robustness). The reported SOTA gains (+8.2 over unweighted, +3.6 over AKL) and forgetting reductions are presented as experimental outcomes on MATH-500/AIME, not as quantities derived from the weighting by construction. No equation equates the final performance metric to the input SNR fit; the theoretical step takes the observed functional signature as given and derives the weight family from it, which is a non-circular empirical-to-analytic pipeline. No self-citation chains or fitted-input-as-prediction patterns appear in the provided derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cross-problem gradient signal-to-noise ratio follows a bell curve over student pass rate, collapsing at both extremes
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel; Jcost; cost_alpha_one_eq_jcost matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
We prove the Beta kernel w(p)=p^α(1-p)^β is the leading-order optimal weight family arising from the SNR boundary-collapse structure... equals the inverse Bernoulli Fisher information (Remark 5).
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective; Jcost_pos_of_ne_one; absolute_floor_iff_bare_distinguishability echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Under power-law regularity at the boundaries (Assumption 3(b)), any such SNR profile decomposes as p^{a'}(1-p)^{b'}·e^r(p) with bounded remainder (Proposition 2).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 4 Pith papers
-
Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning
EGRSD and CL-EGRSD advance the accuracy-length frontier in LLM reasoning by entropy-guided weighting of token-level distillation signals from the teacher.
-
Rubric-based On-policy Distillation
Rubric-based on-policy distillation allows training student models using only teacher responses by generating scoring rubrics from contrasts and using them for on-policy optimization, achieving superior performance an...
-
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
Anti-Self-Distillation reverses self-distillation signals via PMI to fix overconfidence on structural tokens, matching GRPO baseline accuracy 2-10x faster with up to 11.5 point gains across 4B-30B models.
-
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.
Reference graph
Works this paper leans on
-
[1]
Nicolas Boizard, Kevin El Haddad, Céline Hudelot, and Pierre Colombo. Towards cross-tokenizer distillation: the universal logit distillation loss for llms.arXiv preprint arXiv:2402.12030,
-
[2]
Saeed Ghadimi and Guanghui Lan
URL https://zenodo.org/records/10256836. Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM Journal on Optimization, 23(4):2341–2368,
-
[3]
URLhttps://arxiv.org/abs/2407.21783. Alex Graves, Marc G Bellemare, Jacob Menick, Rémi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks.ICML,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
MiniLLM: On-Policy Distillation of Large Language Models
URLhttps://arxiv.org/abs/2306.08543. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.Proceedings of ICLR, 2021a. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematica...
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Taehyeon Kim, Jaehoon Oh, NakYoung Kim, Sangheum Cho, and Se-Young Yun. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation.arXiv preprint arXiv:2105.08919,
-
[6]
Sequence-Level Knowledge Distillation
URL https://arxiv.org/abs/1606.07947. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks.Proceedings of the Natio...
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
An Empirical Model of Large-Batch Training
Sam McCandlish, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. An empirical model of large-batch training.arXiv preprint arXiv:1812.06162,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
gpt-oss-120b & gpt-oss-20b Model Card
URLhttps://arxiv.org/abs/2508.10925. Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. InInternational Conference on Machine Learning (ICML),
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
CRISP: Compressed Reasoning via Iterative Self-Policy Distillation
Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang, and Jiachen Sun. On-policy self-distillation for reasoning compression.arXiv preprint arXiv:2603.05433,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
10 Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Self-Distillation Enables Continual Learning
URLhttps://arxiv.org/abs/2601.19897. Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. pages 1279–1297,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, and Sagar Jain. Understanding and improving knowledge distillation.arXiv preprint arXiv:2002.03532,
-
[14]
The information bottleneck method
Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method.arXiv preprint physics/0004057,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, and Zhipeng Wang. Overconfident errors need stronger correction: Asymmetric confidence penalties for reinforcement learning.arXiv preprint arXiv:2602.21420,
-
[16]
Shaotian Yan, Kaiyuan Liu, Chen Shen, Bing Wang, Sinan Fan, Jun Zhang, Yue Wu, Zheng Wang, and Jieping Ye. Distribution-aligned sequence distillation for superior long-cot reasoning.arXiv preprint arXiv:2601.09088,
-
[17]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, and Zhenru Zhang. Qwen2.5-math technical report: Toward mathematical expert model via self-improvement.arXiv preprint arXiv:2409.12122,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
URLhttps://arxiv.org/abs/2602.12275. Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xia...
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
URL https://arxiv.org/abs/2601.18734. 11 The appendix is organized into four parts: theory and proofs, prompts and implementation details, additional experiments, and additional interpretations. A Theory and Proofs Proof roadmap.The main-text results map to the appendix as follows: •Result 1 (Structural characterization):Section A.3, Propositions 1 and
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
These are standard conditions in stochastic optimization (Ghadimi and Lan, 2013; Bottou et al., 2018). Assumption 2(Bounded Logits and Jacobian).For all training steps and vocabulary dimensions v, the student and teacher logits are bounded as|lS,v|,|l T,v | ≤B , and the Jacobian of the student logits with respect to parameters satisfies ∥Jθ∥op =∥∂l S/∂θ∥o...
work page 2013
-
[24]
confirm that this as- sumption is well-matched to the observed gradient structure. By Proposition 2, this yields the decomposition SNR2(p) =p a′ (1−p) b′ ·e r(p) with bounded remainder r. The Beta kernel pa′ (1−p) b′ is the leading-order (maximum-parsimony) approximation obtained by setting the shape variation of r to zero. When we write “SNR2(p)∝p a′ (1−...
work page 2022
-
[25]
applies within each such epoch. The paper’s main experiments correspond to the single-pass special case where recomputation is disabled. 14 A.3 Gradient Boundary Conditions and Representation Theorem The following two propositions establish—under mild structural conditions on distillation—that the gradient SNR collapses at both boundaries (SNR→0 as p→0 an...
work page 2020
-
[26]
but r(p) = p |logp| → ∞ . The asymptotic power-law condition f(p)/p α0 →c 0 is strictly stronger and ensuresrconverges tologc 0 rather than diverging. Maximum parsimony.Since w∗ is defined only up to proportionality (the overall scale is absorbed by the learning rate), the constants c0, c1 are irrelevant for the weight profile. The Beta kernel pα0(1−p) β0...
work page 1978
-
[27]
(33) In the symmetric case (α=β= 1 , γ= 2a s −1 ): R≈0.84 for as = 1/4; R≈1.00 at as ≈0.34 ; andR >1fora s ≥1/2. A.6.2 Convergence Rate Proposition 8(Convergence Rate of Beta Kernel Weighted SGD).Under Assumptions 1–4, SGD on Lw with learning rate η≤1/L for T steps satisfies the standard non-convex bound (Ghadimi and Lan, 2013): 1 T T−1X t=0 E ∥∇Lw(θt)∥2 ...
work page 2013
-
[28]
Remember to put your answer on its own line after “Answer:”. Figure 3:Prompt example for student and teacher policies.Both policies share the same model family but differ in conditioning context. The teacher receives the expert solution yE as additional context, while the student receives only the original problem. This contextual asymmetry enables black-...
work page 2025
-
[29]
on the distillation split of DAPO-Math-17k (Yu et al., 2025). Concretely, we run GRPO with group size G=8, KL penalty coefficient βKL=0.001, learning rate 1×10−6, global batch size 128, and a cosine schedule over 2 epochs; all other settings follow the DAPO recipe (Yu et al., 2025). The resulting model serves as afrozenteacher throughout all distillation ...
work page 2025
-
[30]
Shared settings are listed once, with setting-specific differences noted explicitly
Max prompt length (student) 1,024 tokens (problem only) Max prompt length (teacher) 3,072 tokens (problem + expert solution) Max response length 16,384 tokens (training) Generation (student rollout) Temperature 1.0 Rollouts per prompt (K) 8 Max generation tokens 8,192 Evaluation Benchmarks MATH-500, AIME 2024, AIME 2025, MMLU (2,000-question random subsam...
work page 2024
-
[31]
with diminishing returns, suggestingK=8strikes a practical balance between estimation quality and rollout cost. C.1.3 Effect of Periodic Pass-Rate Recomputation The main experiments estimate pass rates once before training (single-pass). We ablate the recompu- tation interval on Qwen3-1.7B distillation (forward KL,α=β=1,K=8). Interpretation.Periodic recom...
work page 2024
-
[32]
Results are reported as 8-sample mean accuracy. The first two rows give the corresponding single-loss references under the same midpoint-recompute setup, and the last two rows isolate schedule order. Stage 1 Stage 2 MATH-500 (↑) AIME 2024 (↑) AIME 2025 (↑) MMLU Fgt. (↓) Paced KL Paced KL 79.7% 25.6% 21.1% 1.3% Paced RevKL Paced RevKL 78.8% 23.5% 19.4% 1.2...
work page 2024
-
[33]
and the remaining steps use Paced RevKL (Stage 2). Results are 8-sample mean accuracy. Schedule Stage 1 ratio MATH-500 (↑) AIME 2024 (↑) AIME 2025 (↑) MMLU Fgt. (↓) KL→RevKL 25% 78.9% 22.4% 19.3% 1.6% KL→RevKL 50% 81.4% 26.1% 22.8% 1.1% KL→RevKL 75% 80.1% 26.9% 20.6% 1.1% The 50/50 split offers the best overall trade-off ( 81.4% MATH-500, 26.1% AIME 2024,...
work page 2024
-
[34]
as teacher and Llama-3.1-8B-Instruct as student, with forward KL as the base loss. All other settings (DAPO training data, K=8 rollouts, α=β=1) follow the Qwen3 distillation track, with the same learning rate of1×10 −7. Table 13: Distillation from Llama-3.3-70B-Instruct to Llama-3.1-8B-Instruct (forward KL family): reasoning performance (8-sample mean acc...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.