GRASP: Deterministic argument ranking in interaction graphs
Pith reviewed 2026-05-20 11:54 UTC · model grok-4.3
The pith
Local pairwise judgments on argument attacks and supports produce more consistent global rankings than holistic LLM verdicts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRASP aggregates stable local interaction judgments into a global ranking via a convergent attack-defense propagation operator. Local pairwise judgments on attacks and supports are shown to be more reproducible across models than holistic verdicts. GRASP scores measure structural sufficiency, a defense-aware notion of argument robustness over the explicit interaction graph, and do not correlate with human convincingness labels.
What carries the argument
The convergent attack-defense propagation operator that iteratively updates argument strengths according to supporting and attacking relations until a unique ranking emerges.
Load-bearing premise
Pairwise LLM judgments on attacks and supports remain stable across models and the propagation operator converges to a unique ranking that reflects argumentative structure rather than model artifacts.
What would settle it
Applying GRASP to the same debate graph with two different LLMs and obtaining substantially different final rankings would indicate that local judgments are not reproducible enough to support the method.
Figures
read the original abstract
Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates local pairwise LLM judgments of attacks and supports in an argument interaction graph via a propagation operator to produce global argument rankings. It claims that local interaction judgments are more reproducible than holistic LLM-as-a-Judge verdicts, that the resulting GRASP rankings are more consistent, and that GRASP scores capture structural sufficiency rather than correlating with human convincingness or rhetorical appeal.
Significance. If the reproducibility claims and convergence properties hold with supporting evidence, GRASP would provide a transparent, auditable alternative to opaque holistic LLM judging by explicitly separating argumentative structure from persuasion. The distinction between structural robustness and human convincingness labels is a useful sociotechnical observation, though its impact depends on validation across models and graph structures.
major comments (3)
- Abstract: The central claim that 'local interaction judgments are more reproducible than holistic rankings' is asserted without any quantitative results, agreement metrics (e.g., Cohen's kappa or Krippendorff's alpha), error bars, or dataset details; this absence makes the reproducibility advantage impossible to evaluate from the provided text.
- Abstract: The attack-defense propagation operator is described as 'convergent' and producing a 'unique ranking,' yet no fixed-point theorem, contraction-mapping argument, initialization independence proof, or explicit handling of cycles (mutual attacks or support loops) is supplied; without these, the operator may depend on starting values or fail to yield a unique attractor on cyclic graphs, directly undermining the determinism and reproducibility claims.
- Abstract: The statement that GRASP scores 'do not correlate with human convincingness labels' is presented without the correlation coefficient, sample size, or statistical test used; this weakens the claim that GRASP measures structural sufficiency rather than model-specific artifacts.
minor comments (1)
- Abstract: The acronym expansion 'Gradual Ranking with Attacks and Support Propagation' is clear, but the manuscript should define the propagation operator formally (e.g., as an iterative update rule) in the main text with explicit notation for attack and support weights.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments correctly identify opportunities to make the abstract more self-contained while preserving the paper's core claims. We address each major comment below and will revise the abstract and, where appropriate, add supporting details or formal elements to the main text or appendix.
read point-by-point responses
-
Referee: Abstract: The central claim that 'local interaction judgments are more reproducible than holistic rankings' is asserted without any quantitative results, agreement metrics (e.g., Cohen's kappa or Krippendorff's alpha), error bars, or dataset details; this absence makes the reproducibility advantage impossible to evaluate from the provided text.
Authors: We agree that the abstract would be strengthened by including a concise reference to the quantitative evidence. The full manuscript reports these results in Section 4, including agreement metrics (Cohen's kappa and Krippendorff's alpha) computed over multiple LLMs and datasets, with error bars from repeated trials. In the revision we will add a brief clause to the abstract summarizing the reproducibility improvement and directing readers to the experimental section for full metrics and dataset descriptions. revision: yes
-
Referee: Abstract: The attack-defense propagation operator is described as 'convergent' and producing a 'unique ranking,' yet no fixed-point theorem, contraction-mapping argument, initialization independence proof, or explicit handling of cycles (mutual attacks or support loops) is supplied; without these, the operator may depend on starting values or fail to yield a unique attractor on cyclic graphs, directly undermining the determinism and reproducibility claims.
Authors: The manuscript currently supports convergence through extensive empirical evaluation on graphs containing cycles (Section 3). We acknowledge that a formal fixed-point argument is not present in the submitted version. In the revised manuscript we will add a short appendix containing a proof sketch based on contraction properties for the propagation operator and explicit handling of cycles via stabilization, thereby addressing the concern about initialization dependence and uniqueness. revision: yes
-
Referee: Abstract: The statement that GRASP scores 'do not correlate with human convincingness labels' is presented without the correlation coefficient, sample size, or statistical test used; this weakens the claim that GRASP measures structural sufficiency rather than model-specific artifacts.
Authors: The full paper presents this analysis in Section 5, reporting Pearson and Spearman correlation coefficients near zero together with sample size and p-values from the statistical test. We will revise the abstract to include a short parenthetical reference to these statistics so that the sociotechnical distinction between structural sufficiency and human convincingness is supported directly in the abstract. revision: yes
Circularity Check
No significant circularity in GRASP derivation
full rationale
The paper defines GRASP as a deterministic aggregation of local pairwise interaction judgments via a convergent attack-defense propagation operator. No equations or claims in the provided text reduce the operator's convergence or the resulting global ranking to a fitted parameter, self-citation chain, or definitional tautology. The reproducibility advantage over holistic judging is presented as an empirical observation rather than a constructed equivalence, and the distinction from human convincingness labels is explicitly non-correlative. The framework is therefore self-contained with independent content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Local pairwise judgments on attacks and supports are stable and more reproducible than holistic verdicts.
- domain assumption The attack-defense propagation operator converges to a unique ranking.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Ranking-based semantics for argumentation frame- works
Leila Amgoud and Jonathan Ben-Naim. Ranking-based semantics for argumentation frame- works. InInternational Conference on Scalable Uncertainty Management, pages 134–147. Springer, 2013
work page 2013
-
[3]
Anthropic. Claude haiku 4.5 system card. Technical report, Anthropic, October 2025. URL https://www.anthropic.com/claude-haiku-4-5-system-card
work page 2025
-
[4]
Anthropic. Claude opus 4.5 system card. Technical report, Anthropic, November 2025. URL https://www.anthropic.com/claude-opus-4-5-system-card . Accessed: 2026-05- 05
work page 2025
-
[5]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[6]
An introduction to argumentation semantics.The knowledge engineering review, 26(4):365–410, 2011
Pietro Baroni, Martin Caminada, and Massimiliano Giacomin. An introduction to argumentation semantics.The knowledge engineering review, 26(4):365–410, 2011
work page 2011
-
[7]
On the input/output behavior of argumentation frameworks.Artificial Intelligence, 217:144–197, 2014
Pietro Baroni, Guido Boella, Federico Cerutti, Massimiliano Giacomin, Leendert Van Der Torre, and Serena Villata. On the input/output behavior of argumentation frameworks.Artificial Intelligence, 217:144–197, 2014
work page 2014
-
[8]
Pietro Baroni, Antonio Rago, and Francesca Toni. How many properties do we need for gradual argumentation? InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018
work page 2018
-
[9]
Audiences in argumentation frameworks.Artificial Intelligence, 171(1):42–71, 2007
Trevor JM Bench-Capon, Sylvie Doutre, and Paul E Dunne. Audiences in argumentation frameworks.Artificial Intelligence, 171(1):42–71, 2007
work page 2007
-
[10]
An extension-based argument- ranking semantics: Social rankings in abstract argumentation
Lars Bengel, Giovanni Buraglio, Jan Maly, and Kenneth Skiba. An extension-based argument- ranking semantics: Social rankings in abstract argumentation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14790–14797, 2025
work page 2025
-
[11]
A logic-based theory of deductive arguments.Artificial Intelligence, 128(1-2):203–235, 2001
Philippe Besnard and Anthony Hunter. A logic-based theory of deductive arguments.Artificial Intelligence, 128(1-2):203–235, 2001
work page 2001
-
[12]
Stefano Bistarelli and Carlo Taticchi. Power index-based semantics for ranking arguments in abstract argumentation frameworks.Intelligenza Artificiale, 13(2):137–154, 2020
work page 2020
-
[13]
A comparative study of ranking-based semantics for abstract argumentation
Elise Bonzon, J´erˆome Delobelle, S´ebastien Konieczny, and Nicolas Maudet. A comparative study of ranking-based semantics for abstract argumentation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016
work page 2016
-
[14]
Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. The snli corpus. 2015
work page 2015
-
[15]
Must read: A systematic survey of computational persuasion, 2025
Nimet Beyza Bozdag, Shuhaib Mehri, Xiaocheng Yang, Hyeonjeong Ha, Zirui Cheng, Esin Durmus, Jiaxuan You, Heng Ji, Gokhan Tur, and Dilek Hakkani-T¨ur. Must read: A systematic survey of computational persuasion, 2025. URLhttps://arxiv.org/abs/2505.07775
-
[16]
Using thematic analysis in psychology.Qualitative research in psychology, 3(2):77–101, 2006
Virginia Braun and Victoria Clarke. Using thematic analysis in psychology.Qualitative research in psychology, 3(2):77–101, 2006
work page 2006
-
[17]
Francesco Bullo.Contraction theory for dynamical systems. Francesco Bullo, 2022. 11
work page 2022
-
[18]
Graduality in argumentation.Journal of Artificial Intelligence Research, 23:245–297, 2005
Claudette Cayrol and Marie-Christine Lagasquie-Schiex. Graduality in argumentation.Journal of Artificial Intelligence Research, 23:245–297, 2005
work page 2005
-
[19]
Ampersand: Argument mining for persuasive online discussions
Tuhin Chakrabarty, Christopher Hidey, Smaranda Muresan, Kathleen Mckeown, and Alyssa Hwang. Ampersand: Argument mining for persuasive online discussions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2933–2943, 2019
work page 2019
-
[20]
arXiv preprint arXiv:2508.18076 , year=
Khaoula Chehbouni, Mohammed Haddou, Jackie Chi Kit Cheung, and Golnoosh Farnadi. Nei- ther valid nor reliable? investigating the use of llms as judges.arXiv preprint arXiv:2508.18076, 2025
-
[21]
Exploring the potential of large language models in computational argumentation
Guizhen Chen, Liying Cheng, Luu Anh Tuan, and Lidong Bing. Exploring the potential of large language models in computational argumentation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2309–2330, 2024
work page 2024
-
[22]
Yun-Shiuan Chuang, Ruixuan Tu, Chengtao Dai, Smit Vasani, Binwei Yao, Michael Henry Tessler, Sijia Yang, Dhavan Shah, Robert Hawkins, Junjie Hu, and Timothy T. Rogers. Debate: A large-scale benchmark for role-playing llm agents in multi-agent, long-form debates, 2025. URLhttps://arxiv.org/abs/2510.25110
-
[23]
Evaluating arguments and making meta-arguments.Informal Logic, 21(2), 2001
Daniel H Cohen. Evaluating arguments and making meta-arguments.Informal Logic, 21(2), 2001
work page 2001
-
[24]
T Edward Damer.Attacking faulty reasoning
-
[25]
Phan Minh Dung. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games.Artificial intelligence, 77(2):321–357, 1995
work page 1995
-
[26]
Exploring the role of prior beliefs for argument persuasion
Esin Durmus and Claire Cardie. Exploring the role of prior beliefs for argument persuasion. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1035–1045, 2018
work page 2018
-
[27]
Esin Durmus and Claire Cardie. A corpus for modeling user and language effects in argumenta- tion on online debating.arXiv preprint arXiv:1906.11310, 2019
-
[28]
Equilibrium states in numerical argumentation networks.Logica Universalis, 9(4):411–473, 2015
Dov M Gabbay and Odinaldo Rodrigues. Equilibrium states in numerical argumentation networks.Logica Universalis, 9(4):411–473, 2015
work page 2015
-
[29]
Gemini Team, Google. Gemini 3 flash model card. Technical report, Google DeepMind, December 2025. URL https://storage.googleapis.com/deepmind-media/Model-C ards/Gemini-3-Flash-Model-Card.pdf. Accessed: 2026-05-05
work page 2025
-
[30]
Barney Glaser and Anselm Strauss.Discovery of grounded theory: Strategies for qualitative research. Routledge, 2017
work page 2017
-
[31]
Assessing the sufficiency of arguments through conclusion generation
Timon Gurcke, Milad Alshomary, and Henning Wachsmuth. Assessing the sufficiency of arguments through conclusion generation. In Khalid Al-Khatib, Yufang Hou, and Manfred Stede, editors,Proceedings of the 8th Workshop on Argument Mining, pages 67–77, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.186 53/v1/...
work page 2021
-
[32]
Explaining length bias in llm-based preference evaluations.arXiv preprint arXiv:2407.01085, 2024
Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Tianfu Wang, Zhengyu Chen, Nicholas Jing Yuan, Jianxun Lian, Kaize Ding, and Hui Xiong. Explaining length bias in llm-based preference evaluations.arXiv preprint arXiv:2407.01085, 2024
-
[33]
A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953
Leo Katz. A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953
work page 1953
-
[34]
Hao Li, Viktor Schlegel, Yizheng Sun, Riza Batista-Navarro, and Goran Nenadic
Hao Li, Viktor Schlegel, Yizheng Sun, Riza Batista-Navarro, and Goran Nenadic. Large language models in argument mining: A survey.arXiv preprint arXiv:2506.16383, 2025. 12
-
[35]
Jialu Li, Esin Durmus, and Claire Cardie. Exploring the role of argument structure in online debate persuasion.arXiv preprint arXiv:2010.03538, 2020
-
[36]
Zhaoqun Li, Xiaotong Fang, Chen Chen, Mengze Li, and Beishui Liao. Argumentation computation with large language models: A benchmark study.arXiv preprint arXiv:2412.16725, 2024
-
[37]
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.arXiv preprint arXiv:2211.09110, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Assaf Libman, Nir Oren, and Bruno Yun. Abstract weighted based gradual semantics in argumentation theory.arXiv preprint arXiv:2401.11472, 2024
-
[39]
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. Deepseek-v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vuli´c, Anna Korhonen, and Nigel Collier. Aligning with human judgement: The role of pairwise preference in large language model evaluators.arXiv preprint arXiv:2403.16950, 2024
-
[41]
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice.Proceedings of the ACM on human-computer interaction, 3(CSCW):1–23, 2019
work page 2019
-
[42]
The llama 4 herd: The beginning of a new era of natively multimodal ai innovation
Meta AI. The llama 4 herd: The beginning of a new era of natively multimodal ai innovation. https://ai.meta.com/blog/llama-4-multimodal-intelligence/ , April 2025. Accessed: 2026-05-05
work page 2025
-
[43]
Unveiling the power of argument arrangement in online persuasive discussions
Nailia Mirzakhmedova, Johannes Kiesel, Khalid Al Khatib, and Benno Stein. Unveiling the power of argument arrangement in online persuasive discussions. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 15659–15671, 2023
work page 2023
-
[44]
Nailia Mirzakhmedova, Marcel Gohsen, Chia Hao Chang, and Benno Stein. Are large lan- guage models reliable argument quality annotators? InConference on Advances in Robust Argumentation Machines, pages 129–146. Springer, 2024
work page 2024
-
[45]
Mistral small creative model card
Mistral AI Team. Mistral small creative model card. https://docs.mistral.ai/models/m odel-cards/mistral-small-creative-25-12, December 2025. Accessed: 2026-05-05
work page 2025
-
[46]
Adversarial nli: A new benchmark for natural language understanding
Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. Adversarial nli: A new benchmark for natural language understanding. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 4885–4901, 2020
work page 2020
-
[47]
Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments.arXiv preprint arXiv:1907.07355, 2019
-
[48]
Update to gpt-5 system card: Gpt-5.2
OpenAI. Update to gpt-5 system card: Gpt-5.2. Technical report, OpenAI, December 2025. URL https://openai.com/index/gpt-5-system-card-update-gpt-5-2/ . Accessed: 2026-05-05
work page 2025
-
[49]
Inferring attack relations for gradual semantics.Argument & Computation, 14(3):327–345, 2023
Nir Oren and Bruno Yun. Inferring attack relations for gradual semantics.Argument & Computation, 14(3):327–345, 2023
work page 2023
-
[50]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022
work page 2022
-
[51]
Towards debate automation: a recurrent model for predicting debate winners
Peter Potash and Anna Rumshisky. Towards debate automation: a recurrent model for predicting debate winners. InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2465–2475, 2017. 13
work page 2017
-
[52]
Ranking passages for argument convinc- ingness
Peter Potash, Adam Ferguson, and Timothy J Hazen. Ranking passages for argument convinc- ingness. InProceedings of the 6th Workshop on Argument Mining, pages 146–155, 2019
work page 2019
-
[53]
Qwen Team. Qwen3-max: Just scale it. https://qwen.ai/blog?id=qwen3-max, September
-
[54]
Accessed: 2026-05-05
work page 2026
-
[55]
On gradual semantics for assumption-based argumentation.arXiv preprint arXiv:2507.10076, 2025
Anna Rapberger, Fabrizio Russo, Antonio Rago, and Francesca Toni. On gradual semantics for assumption-based argumentation.arXiv preprint arXiv:2507.10076, 2025
-
[56]
Can language models recognize convincing arguments?arXiv preprint arXiv:2404.00750, 2024
Paula Rescala, Manoel Horta Ribeiro, Tiancheng Hu, and Robert West. Can language models recognize convincing arguments?arXiv preprint arXiv:2404.00750, 2024
-
[57]
Reza Sanayei, Srdjan Vesic, Eduardo Blanco, and Mihai Surdeanu. Can llms judge de- bates? evaluating non-linear reasoning via argumentation theory semantics.arXiv preprint arXiv:2509.15739, 2025
-
[58]
Identifying argumentative discourse structures in persuasive essays
Christian Stab and Iryna Gurevych. Identifying argumentative discourse structures in persuasive essays. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 46–56, 2014
work page 2014
-
[59]
Large language models are in- consistent and biased evaluators
Rickard Stureborg, Dimitris Alikaniotis, and Yoshi Suhara. Large language models are incon- sistent and biased evaluators.arXiv preprint arXiv:2405.01724, 2024
-
[60]
Systematic biases in llm simulations of debates.arXiv preprint arXiv:2402.04049, 2024
Amir Taubenfeld, Yaniv Dover, Roi Reichart, and Ariel Goldstein. Systematic biases in llm simulations of debates.arXiv preprint arXiv:2402.04049, 2024
-
[61]
Judging the judges: Evaluating alignment and vulnerabilities in llms-as- judges
Aman Singh Thakur, Kartik Choudhary, Venkat Srinik Ramayapally, Sankaran Vaidyanathan, and Dieuwke Hupkes. Judging the judges: Evaluating alignment and vulnerabilities in llms-as- judges. InProceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM2), pages 404–430, 2025
work page 2025
-
[62]
Assaf Toledo, Shai Gretz, Edo Cohen-Karlik, Roni Friedman, Elad Venezian, Dan Lahav, Michal Jacovi, Ranit Aharonov, and Noam Slonim. Automatic argument quality assessment– new datasets and methods.arXiv preprint arXiv:1909.01007, 2019
-
[63]
Intrinsic quality assessment of arguments.arXiv preprint arXiv:2010.12473, 2020
Henning Wachsmuth and Till Werner. Intrinsic quality assessment of arguments.arXiv preprint arXiv:2010.12473, 2020
-
[64]
Computational argumentation quality assessment in natural language
Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. Computational argumentation quality assessment in natural language. InProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 176–187, 2017
work page 2017
-
[65]
xAI Team. Grok 4 model card. Technical report, xAI, August 2025. URL https://data.x.a i/2025-08-20-grok-4-model-card.pdf. Accessed: 2026-05-05
work page 2025
-
[66]
MiMo-V2-Flash Technical Report
Bangjun Xiao, Bingquan Xia, Bo Yang, Bofei Gao, Bowen Shen, Chen Zhang, Chenhong He, Chiheng Lou, Fuli Luo, Gang Wang, et al. Mimo-v2-flash technical report.arXiv preprint arXiv:2601.02780, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[67]
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, et al. Justice or prejudice? quantifying biases in llm-as-a-judge.arXiv preprint arXiv:2410.02736, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[68]
Explain then rank: Scale calibration of neural rankers using natural language explanations from llms
Puxuan Yu, Daniel Cohen, Hemank Lamba, Joel Tetreault, and Alejandro Jaimes. Explain then rank: Scale calibration of neural rankers using natural language explanations from llms. In Findings of the Association for Computational Linguistics: ACL 2025, pages 22716–22730, 2025
work page 2025
-
[69]
adding support must increase strength,
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023. 14 Appendix Table of Contents A Related Work . . . . . . . . . . . . . . . . . . . . . . . ....
work page 2023
-
[70]
This House would ban the use of AI in primary and secondary education. 28
-
[71]
This House would ban stablecoins pegged to national currencies
-
[72]
This House would mandate all businesses to accept only digital payments
-
[73]
This House would require electric vehicle manufacturers to refuse sales in countries with poor environmental records
-
[74]
This House would allow individuals to erase morally distressing memories
-
[75]
This House would ban facial recognition technology in public spaces
-
[76]
This House would require social media companies to make their recommendation algorithms public. Economics & Labor
-
[77]
This House would abolish the minimum wage law
-
[78]
This House would allow the sale and purchase of human organs
-
[79]
This House would ban sovereign wealth funds from investing in private equity
-
[80]
This House would require companies to make the salaries of all their employees publicly available
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.