Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review
Pith reviewed 2026-06-29 23:32 UTC · model grok-4.3
The pith
Rejoinder organizes defense of ICML 2023 ranking experiment around four themes
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors address the discussants' points by organizing their response around four core themes: formulating peer review as a statistical estimation problem; mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; incorporating complementary signals such as reviewer rankings and structured metadata; and exploring a human-centered framework for peer review in the era of generative AI.
What carries the argument
The four core themes used to structure the rejoinder and address discussants' concerns.
If this is right
- Peer review can be treated as a statistical estimation problem to improve ranking accuracy.
- The Isotonic Mechanism can be deployed after adding mitigations for equity and strategic behavior.
- Reviewer rankings and structured metadata can serve as useful complementary signals.
- A human-centered framework can guide peer review adaptations in the presence of generative AI.
Where Pith is reading between the lines
- The same four-theme structure might be reusable for rejoinders in other statistical studies of conference review processes.
- Testing the equity mitigations in a follow-up experiment at a different conference would provide direct evidence of their effectiveness.
- Integrating the human-centered AI framework could connect peer-review research to broader questions of automation in academic evaluation.
Load-bearing premise
That organizing the response around these four themes is sufficient to resolve the discussants' practical and theoretical concerns without requiring new empirical data, formal proofs, or direct rebuttals to specific counter-arguments.
What would settle it
A specific concern raised by one of the discussants that falls outside all four themes and is left unaddressed in the rejoinder.
Figures
read the original abstract
This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and theoretical points raised by the discussants, we organize our response around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a rejoinder to the discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (to appear in JASA). It states that the response to the discussants' practical and theoretical points is organized around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.
Significance. If the thematic organization maps explicitly to and resolves the discussants' specific concerns, the rejoinder could usefully structure ongoing conversation on statistical framing and deployment issues in peer review. Its contribution is primarily organizational, however, as the abstract indicates no new empirical data, formal proofs, or direct rebuttals to individual counter-arguments.
major comments (1)
- [Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on the abstract. We address it directly below and agree that a revision to clarify the mapping would strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.
Authors: We agree that the abstract could more explicitly signal how the four themes correspond to clusters of discussant comments. The full rejoinder text does organize responses to the practical and theoretical points raised, with each theme addressing groups of related concerns (e.g., statistical estimation framing covers modeling critiques; equity and strategic issues address deployment objections). However, the abstract itself does not detail this correspondence. We will revise the abstract to include a brief sentence noting that the themes are chosen to group and respond to specific classes of discussant feedback. As this is a rejoinder, we do not introduce new empirical data or formal proofs; the contribution remains organizational and synthetic. revision: yes
Circularity Check
No circularity: thematic rejoinder contains no derivations or fitted claims
full rationale
The paper is a rejoinder that organizes discussion around four listed themes without any equations, statistical predictions, parameter fitting, or derivation chains. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear, as the content is purely textual response to discussants rather than a mathematical or empirical claim that could reduce to its own inputs by construction. The structure is self-contained as a discussion piece and does not invoke uniqueness theorems or ansatzes.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aziz, H., Lev, O., Mattei, N., Rosenschein, J., and Walsh, T. (2016). Strategyproof peer selection: Mechanisms, analyses, and experiments. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 30
2016
-
[2]
Down, A. (2025). Artificial intelligence research has a slop problem, academics say: `it's a mess'. The Guardian . Published December 6, 2025; last modified December 9, 2025
2025
-
[3]
Goldberg, A., Stelmakh, I., Cho, K., Oh, A., Agarwal, A., Belgrave, D., and Shah, N. B. (2025). Peer reviews of peer reviews: A randomized controlled trial and other experiments. PloS one , 20(4):e0320444
2025
-
[4]
He, C., Wang, F., and Zhu, L. (2026). Emerging knowledge trend in statistical research: A content-based analysis using covariate-assisted dynamic topic model. Journal of the American Statistical Association , pages 1--14
2026
-
[5]
Kim, J., Lee, Y., and Lee, S. (2025). Position: T he AI conference peer review crisis demands author feedback and reviewer rewards. In International Conference on Machine Learning , pages 81634--81651. PMLR
2025
- [6]
-
[7]
Pearson, H., Ledford, H., Hutson, M., and Van Noorden, R. (2025). Exclusive: the most-cited papers of the twenty-first century. Nature , 640(8059):588--592
2025
-
[8]
Rastogi, C., Stelmakh, I., Beygelzimer, A., Dauphin, Y. N., Liang, P., Vaughan, J. W., Xue, Z., Daum \'e III, H., Pierson, E., and Shah, N. B. (2022). How do authors' perceptions of their papers compare with co-authors' perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966
-
[9]
Shah, N., Tabibian, B., Muandet, K., Guyon, I., and Von Luxburg, U. (2018). Design and analysis of the NIPS 2016 review process. Journal of Machine Learning Research , 19:1--34
2018
-
[10]
B., Singh, A., and Daum \'e III, H
Stelmakh, I., Shah, N. B., Singh, A., and Daum \'e III, H. (2021). A novice-reviewer experiment to address scarcity of qualified reviewers in large conferences. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 4785--4793
2021
- [11]
-
[12]
Su, B., Zhang, J., Collina, N., Yan, Y., Li, D., Cho, K., Fan, J., Roth, A., and Su, W. (2025b). The ICML 2023 ranking experiment: E xamining author self-assessment in ML/AI peer review. Journal of the American Statistical Association , pages 1--12
2023
-
[13]
Su, W. (2026). You are the best reviewer of your own papers: T he isotonic mechanism. Operations Research , 74(2):804--824
2026
-
[14]
Su, W. J. (2021). You are the best reviewer of your own papers: A n owner-assisted scoring mechanism. Advances in Neural Information Processing Systems , 34:27929--27939
2021
-
[15]
G., Su, B., Collina, N., Deng, Z., and Su, W
Wen, G. G., Su, B., Collina, N., Deng, Z., and Su, W. (2026). Recommending best paper awards for ML/AI conferences via the isotonic mechanism. arXiv preprint arXiv:2601.15249
- [16]
-
[17]
Xu, Y., Jecmen, S., Song, Z., and Fang, F. (2023). A one-size-fits-all approach to improving randomness in paper assignment. Advances in Neural Information Processing Systems , 36:14445--14468
2023
-
[18]
J., and Fan, J
Yan, Y., Su, W. J., and Fan, J. (2025). Isotonic mechanism for exponential family estimation in machine learning peer review. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(5):1422--1456
2025
-
[19]
Yuan, W., Liu, P., and Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research , 75:171--212
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.