How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
Pith reviewed 2026-05-20 13:34 UTC · model grok-4.3
The pith
Mu-GRPO lets GRPO-style training tolerate much staler rollout data from large sequential stages, matching standard performance while cutting wall-clock time by roughly half.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRPO-style algorithms can operate effectively under substantially higher rollout staleness than the low-staleness regime typically used. Mu-GRPO achieves this by scheduling training into a small number of large sequential stages that separate generation and optimization, then stabilizes the process with relaxed clipping that keeps stale gradients and negative-advantage veto that discards destabilizing updates on negative-advantage responses.
What carries the argument
Mu-GRPO framework with its four-stage sequential schedule, relaxed clipping, and negative-advantage veto that together enable high-staleness rollouts while preserving optimization stability.
If this is right
- Wall-clock training time drops by a factor of about two across tested models and benchmarks.
- The same performance level is reached without needing frequent rollout-optimization switches.
- Stale rollouts can supply the majority of training data without collapsing learning.
- The approach applies directly to existing GRPO pipelines on math reasoning tasks.
Where Pith is reading between the lines
- Similar stage-based scheduling could reduce overhead in other on-policy RL methods that currently require tight synchronization.
- The tolerance for staleness might allow larger effective batch sizes or longer training horizons in compute-limited environments.
- If the stabilization techniques generalize, they could support training on even older data collected from previous model versions.
Load-bearing premise
Relaxed clipping plus negative-advantage veto will keep optimization stable and unbiased even when all rollout data comes from the high-staleness regime of the four-stage schedule.
What would settle it
A controlled run on the same math benchmarks where rollout staleness is increased to the Mu-GRPO level but without the relaxed clipping and veto, showing clear performance drop or training divergence.
Figures
read the original abstract
Group Relative Policy Optimization (GRPO) has been a key driver of recent progress in reinforcement learning with verifiable rewards (RLVR) for large language models, but it is typically trained in a low-staleness, near-on-policy regime that incurs substantial system overhead. We ask a simple question: How off-policy can GRPO be? We show that GRPO-style algorithms can tolerate substantially larger rollout staleness than previously assumed, and propose Mu-GRPO, an RL training framework that organizes training into a small number (e.g., four) of large sequential generation-optimization stages. This design induces high rollout staleness while greatly reducing rollout-optimization switching overhead. To stabilize learning under stale data, Mu-GRPO combines relaxed clipping, which preserves useful stale-rollout gradients, with negative-advantage veto, which removes destabilizing post-trigger suffix updates in negative-advantage responses. Across five language models and multiple math reasoning benchmarks, Mu-GRPO matches or exceeds the performance of standard GRPO while achieving around 2x speedup in wall-clock training time, establishing a substantially improved performance-efficiency trade-off for LLM reinforcement learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Mu-GRPO, a GRPO-style RL framework for LLMs that organizes training into a small number (e.g. four) of large sequential generation-optimization stages. This induces high rollout staleness to reduce switching overhead. Stabilization is achieved via relaxed clipping (to preserve useful stale gradients) and negative-advantage veto (to remove destabilizing post-trigger suffix updates). The authors report that Mu-GRPO matches or exceeds standard GRPO on multiple math reasoning benchmarks across five language models while delivering approximately 2x wall-clock speedup.
Significance. If the stabilization techniques prove reliable, the work would establish that GRPO can operate effectively under substantially higher staleness than previously assumed, yielding a meaningfully better performance-efficiency trade-off for RLVR. The multi-model, multi-benchmark evaluation is a positive feature. However, the absence of ablations, diagnostics, and statistical reporting on the key stabilization components limits the strength of the central claim.
major comments (3)
- [Methods] Methods section (description of Mu-GRPO and the four-stage schedule): the claim that relaxed clipping together with negative-advantage veto reliably stabilizes optimization under high-staleness rollouts is load-bearing, yet the manuscript provides no ablations isolating each component, no gradient-norm or policy-divergence measurements, and no statistics on advantage distributions or rollout staleness to confirm the techniques control off-policyness effects without systematic bias.
- [Experiments] Experiments / Results tables: benchmark scores are reported as matching or exceeding GRPO without error bars, confidence intervals, or statistical significance tests; exact hyper-parameter tables are also absent. This makes it impossible to assess whether the reported 2x speedup and performance parity are robust or could be affected by post-hoc benchmark selection.
- [Methods] Stabilization subsection: the negative-advantage veto is asserted to remove only destabilizing post-trigger suffix updates, but without direct measurements of how the veto interacts with the sequential schedule or any sensitivity analysis, the risk of introducing bias or missing instability in the high-staleness regime remains unaddressed.
minor comments (2)
- [Abstract] Abstract: the phrase 'post-trigger suffix updates' is used without a brief definition or illustrative example; adding one sentence of clarification would improve accessibility.
- [Experiments] Figure or table captions: ensure all reported speedups explicitly state the baseline (standard GRPO with what batching/parallelism) to allow direct comparison.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment point by point below. We agree that additional ablations, statistical reporting, and diagnostics will strengthen the manuscript and will incorporate them in the revision.
read point-by-point responses
-
Referee: [Methods] Methods section (description of Mu-GRPO and the four-stage schedule): the claim that relaxed clipping together with negative-advantage veto reliably stabilizes optimization under high-staleness rollouts is load-bearing, yet the manuscript provides no ablations isolating each component, no gradient-norm or policy-divergence measurements, and no statistics on advantage distributions or rollout staleness to confirm the techniques control off-policyness effects without systematic bias.
Authors: We acknowledge that isolating the individual contributions of relaxed clipping and negative-advantage veto through dedicated ablations would provide stronger support for the stabilization claim. In the revised manuscript we will add an ablation study that disables each component in turn while keeping the four-stage schedule fixed, and report the resulting training curves, final benchmark scores, and stability indicators. We will also include plots of gradient norms and approximate policy divergence (KL) across training steps, together with histograms and summary statistics of advantage values and measured rollout staleness (token-age distribution) for both Mu-GRPO and the baseline. These additions will directly address whether the techniques control off-policy effects without introducing systematic bias. revision: yes
-
Referee: [Experiments] Experiments / Results tables: benchmark scores are reported as matching or exceeding GRPO without error bars, confidence intervals, or statistical significance tests; exact hyper-parameter tables are also absent. This makes it impossible to assess whether the reported 2x speedup and performance parity are robust or could be affected by post-hoc benchmark selection.
Authors: We agree that the current presentation lacks the statistical detail needed to evaluate robustness. In the revision we will rerun the main experiments with at least three independent seeds per model-benchmark pair, add error bars and 95% confidence intervals to all tables, and include paired statistical significance tests (e.g., Wilcoxon or t-tests) between Mu-GRPO and GRPO. A complete hyper-parameter table listing all generation, optimization, and scheduling values will be placed in the appendix. We will also explicitly state that the five models and math-reasoning benchmarks were selected prior to experimentation following the protocol used in prior RLVR literature, thereby ruling out post-hoc selection. revision: yes
-
Referee: [Methods] Stabilization subsection: the negative-advantage veto is asserted to remove only destabilizing post-trigger suffix updates, but without direct measurements of how the veto interacts with the sequential schedule or any sensitivity analysis, the risk of introducing bias or missing instability in the high-staleness regime remains unaddressed.
Authors: We appreciate the referee's emphasis on direct validation of the veto mechanism. In the revised version we will add a dedicated analysis subsection that reports (i) the fraction of tokens vetoed per stage as a function of the sequential schedule, (ii) a sensitivity sweep over the veto threshold showing its effect on both final performance and training stability metrics, and (iii) a comparison of advantage distributions before and after veto application. These measurements will quantify how the veto interacts with the staged schedule and will allow readers to assess any residual risk of bias or undetected instability. revision: yes
Circularity Check
No circularity: empirical validation on external benchmarks with independent scoring
full rationale
The paper proposes Mu-GRPO as an algorithmic organization into sequential generation-optimization stages combined with relaxed clipping and negative-advantage veto to tolerate higher rollout staleness. No equations or derivations are presented that reduce the reported performance or speedup claims to quantities defined by fitted constants, self-referential definitions, or prior self-citations within the paper. Results are measured on external math reasoning benchmarks whose evaluation is independent of the training procedure and fitted values. The central claim rests on end-to-end empirical matching rather than any load-bearing step that collapses to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of sequential stages
axioms (2)
- domain assumption Relaxed clipping preserves useful gradients from stale rollouts
- domain assumption Negative-advantage veto removes destabilizing suffix updates
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
relaxed clipping, which preserves useful stale-rollout gradients, with negative-advantage veto, which removes destabilizing post-trigger suffix updates
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
importance ratio ρi,t(θ) = πθ(ai,t | si,t)/β(ai,t | si,t) ... min(ρ Ai, clip(ρ,0,5) Ai)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
AMC problems and solutions, 2024
Art of Problem Solving. AMC problems and solutions, 2024
work page 2024
-
[2]
AIME problems and solutions, 2025
Art of Problem Solving. AIME problems and solutions, 2025
work page 2025
-
[3]
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, et al. Minimax-M1: Scaling test-time compute efficiently with lightning attention.arXiv:2506.13585, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, et al. Areal: A large-scale asynchronous reinforcement learning system for language reasoning.Advances in Neural Information Processing Systems, 38:36256–36282, 2026
work page 2026
-
[5]
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 2025
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 2025
work page 2025
-
[6]
Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, et al. AsyncFlow: An asynchronous streaming RL framework for efficient LLM post-training.arXiv:2507.01663, 2025
-
[7]
Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yubin Xia, and Haibo Chen. History rhymes: Accelerating LLM reinforcement learning with RhymeRL.arXiv:2508.18588, 2025
-
[8]
Measuring mathematical problem solving with the MATH dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. InNeurIPS, 2021
work page 2021
-
[9]
Open R1: A fully open reproduction of deepseek-r1, January 2025
Hugging Face. Open R1: A fully open reproduction of deepseek-r1, January 2025. URL https://github.com/huggingface/open-r1
work page 2025
-
[10]
Qwen2.5-coder technical report, 2024
Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. Qwen2.5-coder technical report, 2024
work page 2024
-
[11]
Livecodebench: Holistic and contamination free evaluation of large language models for code, 2024
Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Ar- mando Solar-Lezama, Koushik Sen, and Ion Stoica. Livecodebench: Holistic and contamination free evaluation of large language models for code, 2024
work page 2024
-
[12]
Efficient memory management for large language model serving with pagedattention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles, pages 611–626, 2023
work page 2023
-
[13]
Competition-level code generation with alphacode.Science, 378(6624):1092–1097, 2022
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code generation with alphacode.Science, 378(6624):1092–1097, 2022
work page 2022
-
[14]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In International Conference on Learning Representations, volume 2024, pages 39578–39601, 2024. 10
work page 2024
-
[15]
Jiacai Liu, Yingru Li, Yuqian Fu, Jiawei Wang, Qian Liu, and Zhuo Jiang. When speed kills stability: Demystifying RL collapse from the training-inference mismatch, September 2025. URLhttps://richardli.xyz/rl-collapse
work page 2025
-
[16]
Llama 3 Team. The Llama 3 herd of models.arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Asynchronous rlhf: Faster and more efficient off-policy rl for language models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, and Aaron Courville. Asynchronous rlhf: Faster and more efficient off-policy rl for language models. InInternational Conference on Learning Representations, volume 2025, pages 4003– 4029, 2025
work page 2025
-
[18]
OpenAI. OpenAI o1 system card.arXiv:2412.16720, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Defeating the training-inference mismatch via fp16.arXiv preprint arXiv:2510.26788, 2025
Penghui Qi, Zichen Liu, Xiangxin Zhou, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Defeating the training-inference mismatch via FP16.arXiv:2510.26788, 2025
-
[20]
Tapered off-policy REINFORCE: Stable and efficient reinforcement learning for LLMs,
Nicolas Le Roux, Marc G. Bellemare, Jonathan Lebensold, Arnaud Bergeron, Joshua Greaves, Alexandre Fréchette, Carolyne Pelletier, Eric Thibodeau-Laufer, Sándor Toth, and Sam Work. Tapered off-policy REINFORCE: Stable and efficient reinforcement learning for LLMs. arXiv:2503.14286, 2025
-
[21]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
HybridFlow: A flexible and efficient RLHF framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. HybridFlow: A flexible and efficient RLHF framework. In EuroSys, 2025
work page 2025
-
[24]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-LM: Training multi-billion parameter language models using model parallelism.arXiv:1909.08053, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[25]
Jiayi Su, Zeyu Chen, et al. KLEAR: Gradient-preserving clipping for efficient policy optimiza- tion.arXiv:2506.01939, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Yifan Sun, Jingyan Shen, Yibin Wang, Tianyu Chen, Zhendong Wang, Mingyuan Zhou, and Huan Zhang. Improving data efficiency for LLM reinforcement fine-tuning through difficulty- targeted online data selection and rollout replay. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id= uwUkETPIJN
work page 2026
-
[27]
Reinforcement learning for reasoning in large language models with one training example
Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Liyuan Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, and yelong shen. Reinforcement learning for reasoning in large language models with one training example. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems,
-
[28]
URLhttps://openreview.net/forum?id=IBrRNLr6JA
-
[29]
Zhiheng Xi, Xin Guo, Yang Nan, Enyu Zhou, Junrui Shen, Wenxiang Chen, Jiaqi Liu, Jixuan Huang, Xun Deng, Zhihao Zhang, Honglin Guo, Zhikai Lei, Miao Zheng, Guoteng Wang, Peng Sun, Rui Zheng, Hang Yan, Tao Gui, Qi Zhang, and Xuanjing Huang. BAPO: Stabilizing off-policy reinforcement learning for LLMs via balanced policy optimization with adaptive clipping....
work page 2026
-
[30]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, and Zhenru Zhang. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement.arXiv:2409.12122, 2024. 11
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
DAPO: An open-source LLM reinforcement learning system at scale
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, YuYue, Weinan Dai, Tiantian Fan, Gaohong Liu, Juncai Liu, LingJun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Ru Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xiangpeng Wei, Hao ...
work page 2026
-
[33]
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, and Shen Li. PyTorch FSDP: Experiences on scaling fully sharded data parallel.arXiv:2304.11277, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Group Sequence Policy Optimization
Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv:2507.18071, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Haizhong Zheng, Jiawei Zhao, and Beidi Chen. Prosperity before collapse: How far can off-policy RL reach with stale data on LLMs? InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=IIgl5MWelz
work page 2026
-
[36]
Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, and Beidi Chen
Haizhong Zheng, Yang Zhou, Brian R. Bartoldson, Bhavya Kailkhura, Fan Lai, Jiawei Zhao, and Beidi Chen. Act only when it pays: Efficient reinforcement learning for LLM reasoning via selective rollouts. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=x5lITYXmW2
work page 2026
-
[37]
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody H Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al. Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583, 2024
work page 2024
-
[38]
Streamrl: Scalable, heterogeneous, and elastic rl for llms with disaggregated stream generation
Yinmin Zhong, Zili Zhang, Xiaoniu Song, Hanpeng Hu, Chao Jin, Bingyang Wu, Nuo Chen, Yukun Chen, Yu Zhou, Changyi Wan, et al. StreamRL: Scalable, heterogeneous, and elastic RL for LLMs with disaggregated stream generation.arXiv:2504.15930, 2025
-
[39]
slime: An LLM post-training framework for RL scaling, 2025
Zilin Zhu, Chengxing Xie, Xin Lv, and slime Contributors. slime: An LLM post-training framework for RL scaling, 2025. 12 A Overview This appendix provides details and analyses that support the main text. Appendix B describes the experimental setup, including model and dataset details, training hyperparameters, efficiency mea- surement, evaluation protocol...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.