Recognition: unknown
MLLM-as-a-Judge Exhibits Model Preference Bias
Pith reviewed 2026-05-10 15:38 UTC · model grok-4.3
The pith
MLLM judges exhibit self-preference bias toward their own outputs and those from related model families.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Representative MLLMs tend to exhibit self-preference bias when acting as judges, with mutual preference bias within particular model families potentially driven by reused connectors and overlapping instruction-tuning resources; these biases can be quantified via Philautia-Eval and mitigated by an ensemble of MLLMs.
What carries the argument
Philautia-Eval, a method that disentangles model preference tendencies from genuine differences in generation quality using large-scale paired evaluations.
If this is right
- Single-M LLM judge benchmarks may systematically distort performance comparisons between models.
- Model families sharing training components show correlated biases in automatic evaluations.
- Ensemble judges like Pomms can serve as a practical way to reduce bias in evaluation pipelines.
- Evaluation protocols relying on MLLM judges require explicit checks for model-specific preferences.
Where Pith is reading between the lines
- Analogous self-preference effects are likely present when using LLMs as judges in text-only settings.
- Developers could reduce downstream bias by diversifying connectors and instruction data across models.
- Extending Philautia-Eval to other modalities or tasks would test whether the bias pattern generalizes.
Load-bearing premise
Philautia-Eval successfully disentangles model preference tendencies from genuine differences in generation quality without introducing new artifacts.
What would settle it
An experiment where generation quality is first verified as equal by humans or independent metrics across models, then checking whether Philautia-Eval still detects preference biases.
Figures
read the original abstract
Automatic evaluation using multimodal large language models (MLLMs), commonly referred to as MLLM-as-a-Judge, has been widely used to measure model performance. If such MLLM-as-a-Judge methods were biased, they could distort model comparisons and benchmark-driven scientific progress. However, it remains unclear to what extent MLLM-as-a-Judge methods favor or disfavor text generated by specific MLLMs. In this study, we propose Philautia-Eval to investigate such model-specific preference bias. Philautia-Eval quantifies the degree of the bias by disentangling preference tendencies from differences in generation quality. Using 1.29M caption-score pairs collected from 12 MLLMs, we found that representative MLLMs tend to exhibit self-preference bias. Moreover, experimental results indicate mutual preference bias within particular model families, which is potentially driven by reused connectors and overlapping instruction-tuning resources. Finally, we introduce a simple ensemble of MLLMs, Pomms. Our results demonstrated that Pomms effectively mitigated the model-specific preference bias while maintaining performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Philautia-Eval, a method to quantify model-specific preference bias in MLLM-as-a-Judge by disentangling preference tendencies from differences in generation quality. Using 1.29M caption-score pairs from 12 MLLMs, it reports self-preference bias in representative models and mutual preference bias within model families, potentially attributable to reused connectors and overlapping instruction-tuning data. It further introduces Pomms, a simple ensemble of MLLMs that mitigates the measured bias while preserving evaluation performance.
Significance. If the disentangling procedure in Philautia-Eval is robust, the work identifies a practically important limitation in the growing use of MLLMs for automatic multimodal evaluation, which could otherwise distort model rankings and benchmark-driven research. The scale of the study (1.29M pairs across 12 models) and the proposed mitigation via ensemble provide concrete, actionable contributions. The findings on family-wise bias also open avenues for understanding training-data overlap effects in multimodal models.
major comments (3)
- [§3] §3 (Philautia-Eval): The central claim that the method successfully disentangles preference bias from genuine quality differences rests on an unspecified normalization or regression step. No explicit equations, pseudocode, or ablation on residual correlation with judge training data are provided, leaving open the possibility that measured self-preference is partly an artifact of shared generation/scoring pipelines.
- [§4.2] §4.2 (Results on 1.29M pairs): The reported self-preference and family-wise mutual bias figures lack accompanying statistical controls (e.g., permutation tests, multiple-comparison correction across 12 models, or independent quality oracle) that would confirm the bias is not driven by unaccounted confounders in caption generation.
- [§5] §5 (Causal interpretation): The statement that mutual bias is 'potentially driven by reused connectors and overlapping instruction-tuning resources' is presented without any supporting analysis (data-overlap metrics, connector ablation, or controlled fine-tuning experiments), weakening the explanatory claim even if the bias measurement itself holds.
minor comments (2)
- [Abstract] Abstract and §2: The names 'Philautia-Eval' and 'Pomms' are introduced without expansion or motivation, which reduces immediate readability for readers unfamiliar with the Greek root or acronym.
- [Figure 2] Figure 2 or equivalent bias heatmap: Error bars or confidence intervals are missing from the per-model bias scores, making it difficult to judge the reliability of the reported differences.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and suggestions. We provide point-by-point responses to the major comments below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Philautia-Eval): The central claim that the method successfully disentangles preference bias from genuine quality differences rests on an unspecified normalization or regression step. No explicit equations, pseudocode, or ablation on residual correlation with judge training data are provided, leaving open the possibility that measured self-preference is partly an artifact of shared generation/scoring pipelines.
Authors: We agree that the disentangling procedure requires more explicit documentation. In the revised version, we will add the full set of equations describing the normalization and regression steps used in Philautia-Eval, include pseudocode for the algorithm, and perform an ablation analysis to check for residual correlations with the training data of the judge models. This will address concerns about potential artifacts from shared pipelines. revision: yes
-
Referee: [§4.2] §4.2 (Results on 1.29M pairs): The reported self-preference and family-wise mutual bias figures lack accompanying statistical controls (e.g., permutation tests, multiple-comparison correction across 12 models, or independent quality oracle) that would confirm the bias is not driven by unaccounted confounders in caption generation.
Authors: We thank the referee for this valuable suggestion. We will enhance §4.2 by adding permutation tests to validate the significance of the bias measurements and apply appropriate multiple-comparison corrections for the 12 models. While we do not have an independent quality oracle in the current study, the large scale of the 1.29M caption-score pairs helps control for confounders; we will explicitly discuss this in the revision and note it as a limitation. revision: partial
-
Referee: [§5] §5 (Causal interpretation): The statement that mutual bias is 'potentially driven by reused connectors and overlapping instruction-tuning resources' is presented without any supporting analysis (data-overlap metrics, connector ablation, or controlled fine-tuning experiments), weakening the explanatory claim even if the bias measurement itself holds.
Authors: We recognize that the explanatory claim is not supported by direct analysis. In the revision, we will modify the language in §5 to present this as a hypothesis rather than a firm attribution, and we will include a discussion on how future work could use data-overlap metrics or ablations to investigate this. The core bias measurements remain valid independently of this interpretation. revision: yes
Circularity Check
No circularity: empirical bias measurement via new disentangling method on collected data
full rationale
The paper proposes Philautia-Eval as a new framework to quantify model-specific preference bias by disentangling it from generation quality differences, then applies it to an independently collected dataset of 1.29M caption-score pairs across 12 MLLMs. The self-preference and family-wise mutual bias findings are presented as direct experimental observations from this evaluation, with an additional ensemble method (Pomms) introduced to mitigate observed bias. No equations, fitted parameters, or self-citations are described that would reduce the bias quantification or central claims to tautological inputs by construction. The derivation chain consists of data collection followed by application of the proposed disentangling procedure, which is external to the measured outputs and does not invoke prior author work as a uniqueness theorem or ansatz. This is a standard empirical study without load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Philautia-Eval
no independent evidence
-
Pomms
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Abdin, M., et al.: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219 (2024)
work page internal anchor Pith review arXiv 2024
-
[2]
In: EMNLP
Adilazuarda, M., Mukherjee, S., Lavania, P., Singh, S., Aji, A., O’Neill, J., Modi, A., Choudhury, M.: Towards Measuring and Modeling “Culture” in LLMs: A Sur- vey. In: EMNLP. pp. 15763–15784 (2024)
2024
-
[3]
From images to sentences through scene description graphs using commonsense reasoning and knowledge,
Aditya, S., Yang, Y., Baral, C., Aloimonos, Y.: From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge. arXiv:1511.03292 (2015)
-
[4]
In: ICCV
Agrawal, H., Desai, K., et al.: nocaps: Novel Object Captioning at Scale. In: ICCV. pp. 8948–8957 (2019)
2019
-
[5]
In: ICLR (2024)
Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., Zhou, J.: Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities. In: ICLR (2024)
2024
-
[6]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., et al.: Qwen2.5-VL Technical Report. arXiv:2502.13923 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
In: EMNLP
Chan, D., Petryk, S., et al.: CLAIR: Evaluating Image Captions with Large Lan- guage Models. In: EMNLP. pp. 13638–13646 (2023)
2023
-
[8]
In: ICML
Chen, D., Chen, R., Zhang, S., Wang, Y., Liu, Y., Zhou, H., Zhang, Q., Wan, Y., Zhou, P., Sun, L.: MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-language Benchmark. In: ICML. vol. 235, pp. 6562–6595 (2024)
2024
-
[9]
In: EMNLP
Chen, G., Chen, S., Liu, Z., Jiang, F., Wang, B.: Humans or LLMs as the Judge? A Study on Judgement Bias. In: EMNLP. pp. 8301–8327 (2024)
2024
- [10]
-
[11]
Chen, Z., Wang, W., Cao, Y., Liu, Y., Gao, Z., Cui, E., et al.: Expanding Per- formance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling. arXiv:2412.05271 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
In: EMNLP
Chen, Z., Wang, H., Zhang, X., Hu, E., Lin, Y.: Beyond the Surface: Measuring Self-Preference in LLM Judgments. In: EMNLP. pp. 1653–1672 (2025)
2025
-
[13]
Comanici, G., Bieber, E., et al.: Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next-Generation Agentic Capabili- ties. arXiv:2507.06261 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
In: CVPR
Deitke, M., Clark, C., Lee, S., Tripathi, R., Yang, Y., Park, J., Salehi, M., et al.: Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision- Language Models. In: CVPR. pp. 91–104 (2025)
2025
-
[15]
In: NAACL
Fu, J., Ng, S., Jiang, Z., et al.: GPTScore: Evaluate as You Desire. In: NAACL. pp. 6556–6576 (2024)
2024
-
[16]
Computational Linguistics50(3), 1097–1179 (2024)
Gallegos, I., Rossi, R., Barrow, J., Tanjim, M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., et al.: Bias and Fairness in Large Language Models: A Survey. Computational Linguistics50(3), 1097–1179 (2024)
2024
-
[17]
The Innovation (2024) 16 S
Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., et al.: A Survey on LLM-as-a-Judge. The Innovation (2024) 16 S. Koyama et al
2024
-
[18]
In: AAAI (2026)
Hirano, S., Wada, Y., Matsuda, K., Otsuki, S., Sugiura, K.: LLM-Free Image Cap- tioning Evaluation in Reference-Flexible Settings. In: AAAI (2026)
2026
-
[19]
JAIR47, 853–899 (2013)
Hodosh, M., et al.: Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. JAIR47, 853–899 (2013)
2013
-
[20]
In: ICLR Workshop (2025)
Hu, Z., Song, L., Zhang, J., Xiao, Z., Chen, Z., Xiong, H.: Explaining Length Bias in LLM-Based Preference Evaluations. In: ICLR Workshop (2025)
2025
-
[21]
Hurst,A.,Lerer,A.,Goucher,A.,Perelman,A.,Ramesh,A.,etal.:GPT-4oSystem Card. arXiv:2410.21276 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
AAAI (2026)
Inoue, N., Goto, K., Oi, M., Gruszka, M., Ukai, M., Hirose, T., Sekikawa, Y.: DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning. AAAI (2026)
2026
-
[23]
Visual Intelligence3(1), 27 (2025)
Jin, Y., Li, J., Gu, T., Liu, Y., Zhao, B., Lai, J., Gan, Z., Wang, Y., Wang, C., Tan, X., et al.: Efficient Multimodal Large Language Models: A Survey. Visual Intelligence3(1), 27 (2025)
2025
-
[24]
Kim, H., Kim, S., Jeong, J., Cho, Y., Cho, S.: EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations. In: ACL. pp. 26642– 26657 (2025)
2025
-
[25]
Krasin, I., Duerig, T., Alldrin, N., Veit, A., Abu-El-Haija, S., Belongie, S., Cai, D., Feng, Z., Ferrari, V., Gomes, V.: OpenImages: A Public Dataset for Large-Scale Multi-Label and Multi-Class Image Classification. (2016)
2016
-
[26]
PNAS122(31), e2415697122 (2025)
Laurito, W., Davis, B., et al.: AI–AI Bias: Large Language Models Favor Com- munications Generated by Large Language Models. PNAS122(31), e2415697122 (2025)
2025
-
[27]
Lee, Y., Park, L., Kang, M.: FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model. In: ACL. pp. 3732– 3746 (2024)
2024
-
[28]
Lee, Y., et al.: CheckEval: A Reliable LLM-as-a-Judge Framework for Evaluating Text Generation Using Checklists. arXiv:2403.18771 (2024)
-
[29]
TMLR (2024)
Li, B., Zhang, Y., Guo, D., Zhang, R., Li, F., Zhang, H., Zhang, K., Zhang, P., Li, Y., Liu, Z., et al.: LLaVA-OneVision: Easy Visual Task Transfer. TMLR (2024)
2024
-
[30]
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Li,H.,Dong,Q.,Chen,J.,Su,H.,Zhou,Y.,Ai,Q.,Ye,Z.,Liu,Y.:LLMs-as-Judges: A Comprehensive Survey on LLM-Based Evaluation Methods. arXiv:2412.05579 (2024)
work page internal anchor Pith review arXiv 2024
-
[31]
Eagle 2: Building post-training data strategies from scratch for frontier vision-language models
Li, Z., Chen, G., Liu, S., Wang, S., VS, V., Ji, Y., Lan, S., Zhang, H., et al.: Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models. arXiv:2501.14818 (2025)
-
[32]
Benchmark evalua- tions, applications, and challenges of large vision language models: A survey,
Li, Z., Wu, X., Du, H., Liu, F., Nghiem, H., Shi, G.: A Survey of State-of-the- Art Large Vision Language Models: Alignment, Benchmarks, Evaluations, and Challenges. arXiv:2501.02189 (2025)
-
[33]
In: ECCV
Lin, T., Maire, M., Belongie, S., Bourdev, L., Girshick, R., et al.: Microsoft COCO: Common Objects in Context. In: ECCV. pp. 740–755 (2014)
2014
-
[34]
Liu, H., Li, C., Li, Y., Li, B., Zhang, Y., Shen, S., Lee, J.: LLaVA-NeXT: Improved Reasoning, OCR, and World Knowledge (2024)
2024
-
[35]
In: CVPR
Liu, H., et al.: Improved Baselines with Visual Instruction Tuning. In: CVPR. pp. 26296–26306 (2024)
2024
-
[36]
In: EMNLP
Liu, Y., Iter, D., Xu, Y., Wang, S., et al.: G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. In: EMNLP. pp. 2511–2522 (2023)
2023
-
[37]
Liu, Y., Moosavi, S., Lin, C.: LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores. In: ACL. pp. 12688–12701 (2024) MLLM-as-a-Judge Exhibits Model Preference Bias 17
2024
-
[38]
In: EMNLP
Matsuda, K., Wada, Y., Hirano, S., Otsuki, S., Sugiura, K.: VELA: An LLM- Hybrid-as-a-Judge Approach for Evaluating Long Image Captions. In: EMNLP. pp. 8680–8696 (2025)
2025
-
[39]
In: ACCV
Matsuda, K., et al.: DENEB: A Hallucination-Robust Automatic Evaluation Met- ric for Image Captioning. In: ACCV. pp. 3570–3586 (2024)
2024
-
[40]
Mordor Intelligence: Large Language Model (LLM) Market Size & Share Anal- ysis (2026),https://www.mordorintelligence.com/industry-reports/large- language-model-llm-market
2026
-
[41]
In: EMNLP
Nangia, N., et al.: CrowS-pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In: EMNLP. pp. 1953–1967 (2020)
1953
-
[42]
In: ACL Findings
Ohi, M., et al.: Likelihood-based Mitigation of Evaluation Bias in Large Language Models. In: ACL Findings. pp. 3237–3245 (2024)
2024
-
[43]
In: NeurIPS
Panickssery, A., Bowman, S., Feng, S.: LLM Evaluators Recognize and Favor Their Own Generations. In: NeurIPS. vol. 37, pp. 68772–68802 (2024)
2024
-
[44]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach
Ranjan, R., Gupta, S., Singh, S.: A Comprehensive Survey of Bias in LLMs: Cur- rent Landscape and Future Directions. arXiv:2409.16430 (2024)
-
[45]
In: CVPR
Sarto, S., Barraco, M., et al.: Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. In: CVPR. pp. 6914–6924 (2023)
2023
-
[46]
In: IJCV (2025)
Sarto,S.,Moratelli,N.,etal.:Positive-AugmentedContrastiveLearningforVision- and-Language Evaluation and Training. In: IJCV (2025)
2025
-
[47]
In: NAACL
Shen, S., Logeswaran, L., Lee, M., Lee, H., Poria, S., Mihalcea, R.: Understanding the Capabilities and Limitations of Large Language Models for Cultural Common- sense. In: NAACL. pp. 5668–5680 (2024)
2024
-
[48]
In: AACL
Shi, L., Ma, C., Liang, W., Diao, X., et al.: Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge. In: AACL. pp. 292–314 (2025)
2025
-
[49]
Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ramé, A., Rivière, M., et al.: Gemma 3 Technical Report. arXiv:2503.19786 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
In: AAAI
Tong, T., He, S., et al.: G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o. In: AAAI. vol. 39, pp. 7419–7427 (2025)
2025
-
[51]
In: CVPR
Wada, Y., Kanta, K., et al.: Polos: Multimodal Metric Learning from Human Feed- back for Image Captioning. In: CVPR. pp. 13559–13568 (2024)
2024
-
[52]
Wang, P., Li, L., Chen, L., Cai, Z., Zhu, D., Lin, B., Cao, Y., Kong, L., Liu, Q., Liu, T., et al.: Large Language Models Are Not Fair Evaluators. In: ACL. pp. 9440–9450 (2024)
2024
-
[53]
In: NeurIPS Workshop (2024)
Wataoka, K., Takahashi, T., Ri, R.: Self-Preference Bias in LLM-as-a-Judge. In: NeurIPS Workshop (2024)
2024
-
[54]
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Wu, Z., Chen, X., Pan, Z., Liu, X., Liu, W., Dai, D., Gao, H., Ma, Y., Wu, C., Wang, B., et al.: DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding. arXiv:2412.10302 (2024)
work page internal anchor Pith review arXiv 2024
-
[55]
Xu, W., Zhu, G., Zhao, X., et al.: Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement. In: ACL. pp. 15474–15492 (2024)
2024
-
[56]
In: ECCV
Yao, Z., et al.: HiFi-Score: Fine-Grained Image Description Evaluation with Hier- archical Parsing Graphs. In: ECCV. pp. 441–458 (2024)
2024
-
[57]
National Science Review11(12) (2024)
Yin, S., Fu, C., Zhao, S., Li, K., et al.: A Survey on Multimodal Large Language Models. National Science Review11(12) (2024)
2024
-
[58]
In: NeurIPS
Zheng, L., Chiang, W., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In: NeurIPS. vol. 36, pp. 46595–46623 (2023)
2023
-
[59]
Zhou, C., et al.: From Perception to Cognition: A Survey of Vision-Language Inter- active Reasoning in Multimodal Large Language Models. arXiv:2509.25373 (2025) 18 S. Koyama et al
-
[60]
rice crispy balls
Zhu, L., Wang, X., Wang, X.: JudgeLM: Fine-Tuned Large Language Models Are Scalable Judges. In: ICLR (2025) MLLM-as-a-Judge Exhibits Model Preference Bias Shuitsu Koyama⋆, Yuiga Wada⋆, Daichi Yashima, and Komei Sugiura Keio University, Japan {koyamashu3, yuiga, ydaichi1207, komei.sugiura}@keio.jp A Details of Experimental Setup A.1 Generators and Evaluato...
2025
-
[61]
Carefully observe the provided image to understand its main content
-
[62]
Read the reference captions carefully to identify the important information they highlight
-
[63]
Compare the generated caption to both the reference captions and the visual content of the image
-
[64]
Assess how well the generated caption covers the main points of the visual con- tent and the reference captions, and how much irrelevant or redundant information it contains
-
[65]
Please remember the score
Assign an integer score from 0 to 100, considering both the alignment with the image and the inclusion of key points from the references. Please remember the score. Reference captions: {{Reference}} Image is attached Generated captions: {{Caption}} Response Format: You should first give a detailed reason for your score, ending with a sentence like this: T...
-
[66]
Carefully observe the image provided
-
[67]
Identify the main points of the visual content in the image
-
[68]
Assess how well the generated caption covers the main points of the visual content, and how much irrelevant or redundant information it contains
-
[69]
Generated captions: {{Caption}} Response Format: You should first give detailed reason for your score, and ending with sentence like this: The final score is ${{score}}$
Assign an integer score from 0 to 100, please remember it. Generated captions: {{Caption}} Response Format: You should first give detailed reason for your score, and ending with sentence like this: The final score is ${{score}}$. Note that the score should be an integer from 0 to 100, and should be wrapped in the dollar signs ($)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.