Are You the A-hole? A Fair, Multi-Perspective Ethical Reasoning Framework
Pith reviewed 2026-05-09 19:36 UTC · model grok-4.3
The pith
A neuro-symbolic system turns conflicting moral opinions into weighted logical predicates and solves for maximum consistency using MaxSAT.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors formalize moral judgment aggregation as a Weighted MaxSAT problem in which a language model extracts predicates and weights from natural language testimony and the Z3 solver finds the assignment that maximizes overall consistency across conflicting views. On the r/AmItheAsshole corpus this produces verdicts that diverge from majority labels 62 percent of the time while matching independent human raters at an 86 percent rate.
What carries the argument
Weighted Maximum Satisfiability (MaxSAT) encoded as soft constraints inside the Z3 solver, after a language model converts natural language explanations into logical predicates with confidence weights.
If this is right
- Moral aggregation tasks can be reframed as optimization problems that prioritize logical consistency over vote counts.
- The solver output identifies which constraints are satisfied or relaxed, supplying an explicit trace for why one verdict was chosen.
- The method scales to large numbers of conflicting testimonies without discarding minority opinions as noise.
- Similar neuro-symbolic pipelines could be applied to other domains involving high-conflict natural language judgments.
Where Pith is reading between the lines
- The approach might serve as a backend for online platforms seeking to surface representative rather than merely popular summaries of community views.
- Applying the same extraction-plus-solver steps to non-English discussions or different cultural contexts would test whether the language model step introduces systematic skew.
- Real-time versions could support group decision processes where participants submit written rationales and receive a logically derived consensus outcome.
Load-bearing premise
The language model can accurately and without bias translate raw human explanations into logical predicates and weights that faithfully capture the original meaning.
What would settle it
If the system is run on a fresh collection of similar posts and the verdicts match majority labels in more than 80 percent of cases or independent human evaluators agree with the outputs below 70 percent, the advantage over popularity-based aggregation would be called into question.
Figures
read the original abstract
Standard methods for aggregating natural language judgments, such as majority voting, often fail to produce logically consistent results when applied to high-conflict domains, treating differing opinions as noise. We propose a neuro-symbolic aggregation framework that formalizes conflict resolution through Weighted Maximum Satisfiability (MaxSAT). Our pipeline utilizes a language model to map unstructured natural language explanations into interpretable logical predicates and confidence weights. These components are then encoded as soft constraints within the Z3 solver, transforming the aggregation problem into an optimization task that seeks the maximum consistency across conflicting testimony. Using the Reddit r/AmItheAsshole forum as a case study in large-scale moral disagreement, our system generates logically coherent verdicts that diverge from popularity-based labels 62% of the time, corroborated by an 86% agreement rate with independent human evaluators. This study demonstrates the efficacy of coupling neural semantic extraction with formal solvers to enforce logical soundness and explainability in the aggregation of noisy human reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a neuro-symbolic framework for aggregating conflicting moral judgments from natural language sources such as Reddit r/AmItheAsshole posts. An LLM extracts logical predicates and confidence weights from unstructured explanations; these are encoded as soft constraints in a Weighted MaxSAT problem solved by Z3 to produce a maximally consistent verdict. The system is reported to diverge from majority-vote labels in 62% of cases while achieving 86% agreement with independent human evaluators, positioning the approach as superior to popularity-based methods for enforcing logical soundness and explainability.
Significance. If the extraction step is shown to be faithful, the work offers a concrete demonstration of coupling neural semantic parsing with formal optimization for ethical reasoning, which could influence research on opinion aggregation in contentious domains. The explicit use of an external solver (Z3) for constraint satisfaction is a methodological strength that supports reproducibility and explainability. The results, if substantiated, would provide evidence that symbolic methods can resolve conflicts in human testimony more coherently than voting baselines.
major comments (2)
- [Abstract] Abstract (and pipeline description): the central claims of 62% divergence from popularity-based labels and 86% human agreement presuppose that the LLM accurately and unbiasedly converts natural language explanations into logical predicates and weights. No quantitative evaluation of extraction fidelity, error rates, or human-annotated gold predicates is reported; without this, the MaxSAT solutions optimize a potentially distorted constraint set rather than the original testimonies.
- [Evaluation] Evaluation section: the manuscript provides no information on the number of posts processed, train/test splits, statistical significance testing for the reported percentages, or ablation studies isolating the contribution of the MaxSAT solver versus the LLM extraction. These omissions make it impossible to determine whether the headline figures are robust or attributable to the proposed framework.
minor comments (1)
- [Abstract] The abstract contains several long sentences that could be split to improve readability and clarity of the technical pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract] Abstract (and pipeline description): the central claims of 62% divergence from popularity-based labels and 86% human agreement presuppose that the LLM accurately and unbiasedly converts natural language explanations into logical predicates and weights. No quantitative evaluation of extraction fidelity, error rates, or human-annotated gold predicates is reported; without this, the MaxSAT solutions optimize a potentially distorted constraint set rather than the original testimonies.
Authors: We agree that direct validation of the LLM extraction step is needed to confirm that the MaxSAT solver operates on faithful representations of the testimonies rather than artifacts of the parser. The reported 86% agreement with human evaluators was obtained by presenting evaluators with the extracted predicates and weights alongside the original posts, providing indirect support for extraction quality. However, this does not substitute for a quantitative fidelity assessment. In the revised manuscript we will add a new subsection reporting extraction accuracy on a human-annotated gold set of 200 posts, including predicate-level precision/recall and weight-assignment error rates. revision: yes
-
Referee: [Evaluation] Evaluation section: the manuscript provides no information on the number of posts processed, train/test splits, statistical significance testing for the reported percentages, or ablation studies isolating the contribution of the MaxSAT solver versus the LLM extraction. These omissions make it impossible to determine whether the headline figures are robust or attributable to the proposed framework.
Authors: We acknowledge these omissions limit assessment of robustness. The revised Evaluation section will report the exact number of posts processed (currently 1,250), confirm that all results are on a held-out test set with no train/test leakage, include binomial confidence intervals and significance tests for the 62% and 86% figures, and add ablation experiments that remove the MaxSAT solver (replacing it with direct LLM aggregation or majority vote over predicates) while keeping the extraction step fixed. revision: yes
Circularity Check
No circularity: derivation uses external solver on LLM-extracted constraints with independent validation
full rationale
The pipeline maps natural language to predicates and weights via LLM, encodes them as soft constraints, and solves via Z3 MaxSAT for maximum consistency. Reported 62% divergence from popularity labels and 86% human agreement are external post-hoc comparisons, not inputs or fitted targets. No equations, self-citations, or ansatzes reduce the verdicts to the source data by construction; the optimization is independent of the final evaluation metrics.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
L. De Moura and N. Bjørner. “Z3: an efficient SMT solver”. In:Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. TACAS’08/ETAPS’08. Budapest, Hungary: Springer- Verlag, 2008, 337–340.isbn: 3540787992
work page 2008
-
[2]
MAGI: Multi-Annotated Explanation- Guided Learning
Y. Zhang, S. Gu, Y. Gao, B. Pan, X. Yang, and L. Zhao. “MAGI: Multi-Annotated Explanation- Guided Learning”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023
work page 2023
-
[3]
Jury Learning: Integrating Dissenting Voices into Machine Learning Models
M. L. Gordon, M. S. Lam, J. S. Park, K. Patel, J. T. Hancock, T. Hashimoto, and M. S. Bernstein. “Jury Learning: Integrating Dissenting Voices into Machine Learning Models”. In: CHI Conference on Human Factors in Computing Systems (CHI ’22)
-
[4]
O. Bsher and A. Sabri.AITA Generating Moral Judgements of the Crowd with Reasoning
- [5]
-
[6]
SCRUPLES: A Corpus of Community Ethical Judgments on 32,000 Real-life Anecdotes
N. Lourie, R. Le Bras, and Y. Choi. “SCRUPLES: A Corpus of Community Ethical Judgments on 32,000 Real-life Anecdotes”. In:Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21). 2021, pp. 13470–13479
work page 2021
-
[7]
Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey
M. Chakraborty, L. Wang, and D. Jurgens. “Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework”. In: (2025). arXiv:2506.14948 [cs.HC].doi:10. 48550/arXiv.2506.14948. arXiv:2506.14948 [cs.HC]
-
[8]
Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models
Y. Li, H. Shirado, and S. Das. “Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models”. In: (2025).doi:10.48550/arXiv.2501.17420
-
[9]
Fairify: Fairness Verification of Neural Networks
S. Biswas and H. Rajan. “Fairify: Fairness Verification of Neural Networks”. In:Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023). IEEE, 2023, pp. 1546–1558
work page 2023
-
[10]
Position: Formal Methods are the Principled Foundation of Safe AI
G. Singh and D. Chawla. “Position: Formal Methods are the Principled Foundation of Safe AI”. In:Workshop on Technical AI Governance (TAIG) at ICML 2025. Vancouver, Canada, 2025
work page 2025
-
[11]
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting
X. Ye, Q. Chen, I. Dillig, and G. Durrett. “SATLM: Satisfiability-Aided Language Models Using Declarative Prompting”. In:Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). 2023
work page 2023
-
[12]
The theory of dyadic morality: Reinventing moral judgment by redefining harm
C. Schein and K. Gray. “The theory of dyadic morality: Reinventing moral judgment by redefining harm”. In:Personality and social psychology review22.1 (2018), pp. 32–70
work page 2018
-
[13]
Moral foundations theory: The pragmatic validity of moral pluralism
J. Graham, J. Haidt, S. Koleva, M. Motyl, R. Iyer, S. P. Wojcik, and P. H. Ditto. “Moral foundations theory: The pragmatic validity of moral pluralism”. In:Advances in experimental social psychology. Vol. 47. Elsevier, 2013, pp. 55–130
work page 2013
-
[14]
Image repair discourse and crisis communication
W. L. Benoit. “Image repair discourse and crisis communication”. In:Public relations review 23.2 (1997), pp. 177–186
work page 1997
-
[15]
S. E. Toulmin.The Uses of Argument. Cambridge: Cambridge University Press, 1958
work page 1958
-
[16]
Smith.The Theory of Moral Sentiments
A. Smith.The Theory of Moral Sentiments. London / Edinburgh: A. Millar; A. Kincaid, and J. Bell, Edinburgh, 1759
-
[17]
Feminist HCI: Taking Stock and Outlining an Agenda for Design
S. Bardzell. “Feminist HCI: Taking Stock and Outlining an Agenda for Design”. In:Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2010, pp. 1301–1310
work page 2010
-
[18]
J. L. Liew.Reddit Dataset. Kaggle dataset. 2022.url:https://www.kaggle.com/datasets/ jianloongliew/reddit
work page 2022
-
[19]
J. D. Greene. “Dual-process morality and the personal/impersonal distinction: A reply to McGuire, Langdon, Coltheart, and Mackenzie”. In:Journal of Experimental Social Psychology 45.3 (2009), pp. 581–584
work page 2009
-
[20]
H. L. A. Hart.Punishment and Responsibility: Essays in the Philosophy of Law. Oxford University Press, 1968
work page 1968
-
[21]
Zehr.The Little Book of Restorative Justice
H. Zehr.The Little Book of Restorative Justice. Good Books, 2002
work page 2002
-
[22]
Crime and Punishment: Distinguishing the Roles of Causal and Intentional Analyses in Moral Judgment
F. Cushman. “Crime and Punishment: Distinguishing the Roles of Causal and Intentional Analyses in Moral Judgment”. In:Cognition108.2 (2008), pp. 353–380
work page 2008
-
[23]
Gilligan.In a Different Voice: Psychological Theory and Women’s Development
C. Gilligan.In a Different Voice: Psychological Theory and Women’s Development. Harvard University Press, 1982
work page 1982
-
[24]
Sher.Who Knew? Responsibility Without Awareness
G. Sher.Who Knew? Responsibility Without Awareness. Oxford University Press, 2009
work page 2009
-
[25]
Issendai.The Missing Missing Reasons.https://www.issendai.com/psychology/estrangement/ missing-missing-reasons.html
-
[26]
Anonymous Reddit User.Am I the Asshole?https://www.reddit.com/r/AmItheAsshole/. Accessed for research purposes; original post anonymized. 2022. AppendixA. A.1.Example of a changed decision Figure 5.Screenshot of an original post from the r/AmItheAsshole subreddit [25]. Figure 5 shows a post labeled NTA by majority-voting that our pipeline reclassifies as ...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.