pith. sign in

arxiv: 2605.16280 · v1 · pith:OCNVJMTAnew · submitted 2026-04-10 · 💻 cs.CY · cs.AI

Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech

Pith reviewed 2026-05-21 09:16 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords neuro-symboliclegal automationhate speechcontent moderationLLMsymbolic scaffoldsRulemappingGerman Criminal Code
0
0 comments X

The pith

Expert symbolic scaffolds constrain LLMs to achieve 0.80-0.86 precision in hate speech classification under German law.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates a neuro-symbolic hybrid to overcome the trade-off between transparency in symbolic systems and flexibility in neural ones for legal reasoning. It applies this to online hate speech moderation as a stand-in for high-volume legal decisions. By using Rulemapping's expert-authored logic trees to guide LLMs, the method prevents models from equating moral offensiveness with legal illegality. This results in significantly higher precision than plain prompting while keeping recall high. A reader would care because it points toward auditable AI systems that can handle regulatory demands in content moderation.

Core claim

Rulemapping operationalizes the legal syllogism with visual logic trees that serve as deterministic symbolic scaffolds for LLMs. When applied to assessing illegality under § 130(1) of the German Criminal Code, this hybrid approach raises precision to 0.80-0.86 from 0.34-0.49 for unconstrained prompting, while recall remains at 0.82-0.89 across various LLMs. The paper concludes that such expert-authored symbolic scaffolds enable robust legal automation aligned with requirements for auditability and verifiable decision-making.

What carries the argument

Rulemapping, a visual logic-tree method operationalising the classic legal syllogism to provide deterministic symbolic scaffolds for LLMs

If this is right

  • Maintains high recall so that illegal content is not missed.
  • Reduces false positives by aligning decisions strictly with the legal standard.
  • Provides transparency and auditability through the explicit logic trees.
  • Works consistently across different underlying LLMs.
  • Supports compliance with regulations requiring verifiable decision processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could extend to automating other areas of administrative law with similar high-volume decisions.
  • Over time, the symbolic scaffolds would need updating as legal interpretations evolve through court rulings.
  • Hybrid systems like this might offer a template for balancing AI capabilities with human oversight in regulated fields.

Load-bearing premise

The expert-authored symbolic logic trees correctly and exhaustively capture the legal standard in §130(1) without their own interpretive bias or incompleteness affecting classification outcomes.

What would settle it

Running the system on a new set of cases featuring legal edge cases or interpretations not fully represented in the logic trees would reveal if the precision advantage over unconstrained LLMs disappears.

Figures

Figures reproduced from arXiv: 2605.16280 by Oskar von Cossel.

Figure 1
Figure 1. Figure 1: Rulemap of § 130(1) no. 1 StGB (Incitement to ha [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Automating legal reasoning forces a choice between imperfect alternatives: symbolic systems offer transparency but struggle with ambiguity, whereas neural systems handle natural language flexibly but lack verifiability. This paper investigates whether a hybrid, neuro-symbolic approach can reconcile this trade-off. We evaluate this architecture in the domain of online content moderation, which serves as a proxy for high-volume legal decision-making such as mass administrative proceedings. In these settings, operators must assess thousands of cases daily under strict legal standards. Specifically, we examine whether constraining large language models (LLMs) within deterministic symbolic scaffolds improves statute-grounded illegality assessment while preventing "scope drift" (where LLMs conflate moral offensiveness with legal illegality). We evaluate the neuro-symbolic variant of Rulemapping - a visual logic-tree method that operationalises the classic legal syllogism - on online hate-speech classification under \S 130(1) of the German Criminal Code. Across diverse LLMs, Rulemapping maintains high recall (0.82-0.89) while achieving precision of 0.80-0.86, compared to 0.34-0.49 for unconstrained prompting. Expert-authored symbolic scaffolds thus enable robust legal automation aligned with regulatory requirements for auditability and verifiable decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that constraining LLMs within expert-authored deterministic symbolic scaffolds (via the Rulemapping visual logic-tree method operationalizing the legal syllogism) reconciles the trade-off between symbolic transparency and neural flexibility for statute-grounded assessment of online hate speech under §130(1) of the German Criminal Code. It reports that this neuro-symbolic variant maintains high recall (0.82-0.89) while raising precision to 0.80-0.86 across diverse LLMs, versus 0.34-0.49 for unconstrained prompting, thereby enabling auditable and verifiable legal automation aligned with regulatory requirements.

Significance. If the central empirical result holds and the scaffolds accurately instantiate the statutory elements without interpretive narrowing, the work would demonstrate a practical hybrid architecture for high-volume legal decision-making. It would provide concrete evidence that expert symbolic constraints can reduce scope drift in LLMs while preserving verifiability, directly addressing auditability demands in automated content moderation and mass administrative proceedings.

major comments (2)
  1. The attribution of the precision lift (0.80-0.86 vs. 0.34-0.49) to improved legal fidelity rests on the assumption that the expert-authored logic trees correctly and exhaustively capture §130(1) without adding extra conjuncts or narrowing 'incitement' or 'hatred' beyond judicial practice. No section describes elicitation of the trees, cross-checking against BGH decisions or commentaries, or convergence across multiple legal experts; if the scaffolds are stricter than the actual standard, the gain is an artifact of reduced positive rate rather than better alignment.
  2. Evaluation section: the reported numeric gains lack supporting details on dataset construction (case selection and source), inter-annotator agreement for ground-truth labels, exact baseline prompting strategies, and statistical significance tests. These omissions make it impossible to rule out post-hoc scaffold selection or to assess whether the precision-recall trade-off generalizes beyond the evaluated instances.
minor comments (2)
  1. Abstract: specify the exact LLMs tested and the number of runs or seeds used to generate the 0.80-0.86 and 0.82-0.89 ranges.
  2. Results tables: add confidence intervals or p-values for the precision and recall comparisons to strengthen the cross-method claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which help clarify key aspects of our neuro-symbolic approach. We address each major comment below and commit to revisions that strengthen the manuscript's transparency and rigor without altering its core claims.

read point-by-point responses
  1. Referee: The attribution of the precision lift (0.80-0.86 vs. 0.34-0.49) to improved legal fidelity rests on the assumption that the expert-authored logic trees correctly and exhaustively capture §130(1) without adding extra conjuncts or narrowing 'incitement' or 'hatred' beyond judicial practice. No section describes elicitation of the trees, cross-checking against BGH decisions or commentaries, or convergence across multiple legal experts; if the scaffolds are stricter than the actual standard, the gain is an artifact of reduced positive rate rather than better alignment.

    Authors: We appreciate the referee's emphasis on validating scaffold fidelity to the statutory standard. The logic trees were constructed by mapping the exact statutory elements of §130(1) (incitement to hatred or violence) onto the legal syllogism using standard doctrinal sources, without additional conjuncts. To address the gap, we will add a dedicated subsection to the Methods detailing the elicitation process, including direct references to relevant BGH decisions on the scope of 'incitement' and 'hatred' as well as alignment with leading commentaries. This documentation will confirm no interpretive narrowing occurred. The maintained high recall range (0.82-0.89) across models further indicates the precision gains are not an artifact of a reduced positive rate, as stricter scaffolds would typically depress recall. We disagree that the gain must be artifactual and will use the revision to make the alignment explicit. revision: yes

  2. Referee: Evaluation section: the reported numeric gains lack supporting details on dataset construction (case selection and source), inter-annotator agreement for ground-truth labels, exact baseline prompting strategies, and statistical significance tests. These omissions make it impossible to rule out post-hoc scaffold selection or to assess whether the precision-recall trade-off generalizes beyond the evaluated instances.

    Authors: We agree that fuller methodological transparency is required. The revised Evaluation section will include: explicit case selection criteria and data sources for the §130(1) instances; inter-annotator agreement statistics (Cohen's kappa) for the expert-provided ground-truth labels; precise specifications of the unconstrained baseline prompting strategies (zero-shot and few-shot variants); and statistical significance testing (e.g., McNemar's test) on the observed precision differences. These additions will allow readers to evaluate generalizability and exclude post-hoc selection concerns. The underlying data and annotation protocol are available for inclusion. revision: yes

Circularity Check

0 steps flagged

Empirical performance comparison with no derivation reducing to inputs by construction

full rationale

The paper reports measured precision and recall on a classification task using expert-authored logic trees as fixed scaffolds. Performance numbers are obtained by direct comparison to external labels rather than by algebraic rearrangement or parameter fitting that forces the reported outcome. No equations appear in the abstract, and the central claim is an observed lift under the stated experimental protocol. The method is presented as an empirical case study rather than a first-principles derivation, so no self-definitional, fitted-input, or self-citation load-bearing circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that expert-crafted symbolic trees faithfully encode the legal rule and that the evaluation set is representative; no free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Expert-authored symbolic scaffolds accurately represent the statutory standard without interpretive bias or incompleteness.
    Invoked when claiming that the constrained LLM produces statute-grounded decisions.

pith-pipeline@v0.9.0 · 5754 in / 1285 out tokens · 40651 ms · 2026-05-21T09:16:50.301506+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 6 internal anchors

  1. [1]

    Stephan Anstötz. 2025. § 130 StGB - Volksverhetzung.Münchener Kommentar zum StGB(2025)

  2. [2]

    Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges.ACM Comput. Surv.58, 6 (Dec. 2025), 163:1–163:37. doi:10.1145/ 3777009

  3. [3]

    Trevor Bench-Capon and M. Sergot. 1988. Towards a Rule-Based Representation of Open Texture in Law.Computing Power and Legal Language(1988), 39–60

  4. [4]

    Michael James Bommarito, Jillian Bommarito, and Daniel Martin Katz. 2025. The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models. social science research network:5211933 doi:10.2139/ssrn.5211933

  5. [5]

    2021.Rechtshandbuch Legal Tech(2 ed.)

    Stephan Breidenbach, Florian Glatz, Tom Braegelmann, Philipp Caba, and Alexan- dra Dietzen. 2021.Rechtshandbuch Legal Tech(2 ed.). C.H. Beck, München

  6. [6]

    Markus Conrads and Sascha Schweitzer. 2025. Juristische Problemlösung Mit KI – Leistung Und Grenzen Großer Sprachmodelle.Neue Juristische Wochenschrift 40 (2025), 2888–2891. https://beck-online.beck.de/Bcid/Y-300-Z-NJW-B-2025- S-2888-N-1

  7. [7]

    Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E Ho. 2024. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models.Journal of Legal Analysis16, 1 (Jan. 2024), 64–93. doi:10.1093/jla/laae003

  8. [8]

    DeepSeek-AI et al. 2025. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs] doi:10.48550/arXiv.2412.19437

  9. [9]

    Christoph Demus et al . 2022. DeTox: A Comprehensive Dataset for Ger- man Offensive Language and Conversation Analysis. InProceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, and Zeerak Talat (Eds.). Association for Computational Linguistics, Seattle, Washington (Hybrid), 143–

  10. [10]

    doi:10.18653/v1/2022.woah-1.14

  11. [11]

    Christian Djeffal. 2014. A Commentary on Commentaries: The Wissenschaftsrat on Legal Commentaries and Beyond.Verfassungsblog(June 2014). doi:10.17176/ 20181005-163652-0

  12. [12]

    Niklas Eder. 2024. Making Systemic Risk Assessments Work: How the DSA Creates a Virtuous Loop to Address the Societal Harms of Content Moderation. German Law Journal25, 7 (Oct. 2024), 1197–1218. doi:10.1017/glj.2024.24

  13. [13]

    Darren Edge et al. 2025. From Local to Global: A Graph RAG Approach to Query- Focused Summarization. arXiv:2404.16130 [cs] doi:10.48550/arXiv.2404.16130

  14. [14]

    Yu Fan et al. 2025. LEXam: Benchmarking Legal Reasoning on 340 Law Exams. arXiv:2505.12864 [cs] doi:10.48550/arXiv.2505.12864

  15. [15]

    Keim, and Maximilian T

    Daniel Fürst, Mennatallah El-Assady, Daniel A. Keim, and Maximilian T. Fis- cher. 2025. Challenges and Opportunities for Visual Analytics in Jurisprudence. Artificial Intelligence and Law(Nov. 2025). doi:10.1007/s10506-025-09494-2

  16. [16]

    Clement Guitton, Aurelia Tamò-Larrieux, Simon Mayer, and Gijs van Dijck. 2025. The Challenge of Open-Texture in Law.Artificial Intelligence and Law33, 2 (June 2025), 405–435. doi:10.1007/s10506-024-09390-1

  17. [17]

    Keyan Guo, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vish- wamitra, and Hongxin Hu. 2023. An Investigation of Large Language Mod- els for Real-World Hate Speech Detection. In2023 International Conference on Machine Learning and Applications (ICMLA). IEEE, 1568–1573. https: //ieeexplore.ieee.org/abstract/document/10459901/

  18. [18]

    Hanjo Hamann. 2021. Transparenz der Justiz: Stagnation seit 50 Jahren.Legal Tribune Online(July 2021). https://www.lto.de/persistent/a_id/45370

  19. [19]

    H. L. A. Hart. 1961. The Concept of Law. InThe Concept of Law. Oxford University Press. doi:10.2307/2217213

  20. [20]

    2011.Thinking, Fast and Slow

    Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux

  21. [21]

    Manuj Kant, Manav Kant, Marzieh Nabi, Preston Carlson, and Megan Ma. 2024. Equitable Access to Justice: Logical LLMs Show Promise. arXiv:2410.09904 [cs] doi:10.48550/arXiv.2410.09904

  22. [22]

    Manuj Kant, Sareh Nabi, Manav Kant, Roland Scharrer, Megan Ma, and Marzieh Nabi. 2025. Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law. arXiv:2502.17638 [cs] doi:10.48550/arXiv.2502.17638

  23. [23]

    Henry Kautz. 2022. The Third AI Summer: AAAI Robert S. Engelmore Memorial Lecture.AI Magazine43, 1 (March 2022), 105–125. doi:10.1002/aaai.12036

  24. [24]

    1995.Methodenlehre Der Rechtswis- senschaft

    Karl Larenz and Claus-Wilhelm Canaris. 1995.Methodenlehre Der Rechtswis- senschaft. Springer, Berlin, Heidelberg. doi:10.1007/978-3-662-08709-1 Oskar von Cossel

  25. [25]

    Florian Ludwig, Torsten Zesch, and Frederike Zufall. 2025. Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech. In Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, Christian Wartena and Ulrich Heid (Eds.). HsH Applied Academics, Hannover, Germany, 154–167. https://ac...

  26. [26]

    Magesh, F

    Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Man- ning, and Daniel E. Ho. 2025. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.Journal of Empirical Legal Studies22, 2 (2025), 216–242. doi:10.1111/jels.12413

  27. [27]

    Scott McLachlan, Evangelia Kyrimi, Kudakwashe Dube, Norman Fenton, and Lisa C. Webley. 2022. Lawmaps: Enabling Legal AI Development through Visual- isation of the Implicit Structure of Legislation and Lawyerly Process.Artificial Intelligence and Law31, 1 (March 2022), 169–194. doi:10.1007/s10506-021-09298-0

  28. [28]

    Scott McLachlan and Lisa C Webley. 2021. Visualisation of Law and Legal Process: An Opportunity Missed.Information Visualization20, 2-3 (July 2021), 192–204. doi:10.1177/14738716211012608

  29. [29]

    Masha Medvedeva and Pauline Mcbride. 2023. Legal Judgment Prediction: If You Are Going to Do It, Do It Right. InProceedings of the Natural Legal Language Processing Workshop 2023, Daniel Preot,iuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras (Eds.). Association for Computational Linguistics, Singapore,...

  30. [30]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean

  31. [31]

    InAdvances in Neural Information Processing Systems, Vol

    Distributed Representations of Words and Phrases and Their Com- positionality. InAdvances in Neural Information Processing Systems, Vol. 26. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2013/hash/ 9aa42b31882ec039965f3c4923ce901b-Abstract.html

  32. [32]

    Mistral AI Team. 2024. Pixtral Large and Mistral Large 2.1. https://mistral.ai/ news/pixtral-large

  33. [33]

    Wolfgang Mitsch. 2011. Volksverhetzung gegen Deutsche.Juristische Rundschau 2011, 9 (2011), 380–382. doi:10.1515/juru.2011.380

  34. [34]

    Robert N Moles. 1991. Logic Programming - An Assessment Of Its Potential For Artificial Intelligence Applications In Law.Journal of Law and Information Science Vol 2 No. 2 (1991), 137–164. https://heinonline.org/HOL/P?h=hein.journals/ jlinfos2&i=145

  35. [35]

    Jack Mumford, Katie Atkinson, and Trevor Bench-Capon. 2022. Reasoning with Legal Cases: A Hybrid ADF-ML Approach. InLegal Knowledge and Information Systems. IOS Press, 93–102. doi:10.3233/FAIA220452

  36. [36]

    Ha-Thanh Nguyen, Vu Tran, Ngoc-Cam Le, Thi-Thuy Le, Quang-Huy Nguyen, Le-Minh Nguyen, and Ken Satoh. 2022. Law to Binary Tree – An Formal Inter- pretation of Legal Natural Language. InProceedings of the International Workshop on Methodologies for Translating Legal Norms into Formal Representations. arXiv, Saarbrücken. arXiv:2212.08335 [cs] doi:10.48550/ar...

  37. [37]

    OpenAI et al. 2024. GPT-4o System Card. arXiv:2410.21276 [cs] doi:10.48550/ arXiv.2410.21276

  38. [38]

    OpenAI et al. 2024. OpenAI O1 System Card. arXiv:2412.16720 [cs] doi:10.48550/ arXiv.2412.16720

  39. [39]

    OpenAI et al . 2025. Gpt-Oss-120b & Gpt-Oss-20b Model Card. arXiv:2508.10925 [cs] doi:10.48550/arXiv.2508.10925

  40. [40]

    Cecilia Panigutti et al. 2023. The Role of Explainable AI in the Context of the AI Act. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1139–1150. doi:10.1145/3593013.3594069

  41. [41]

    Dasha Pruss and Jessie Allen. 2025. Against AI Jurisprudence: Large Lan- guage Models and the False Promises of Empirical Judging.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 3 (Oct. 2025), 2055–2066. doi:10.1609/aies.v8i3.36695

  42. [42]

    Chris Reed and Glenn Rowe. 2004. Araucaria: Software for Argument Analysis, Diagramming and Representation.International Journal on Artificial Intelligence Tools13, 04 (Dec. 2004), 961–979. doi:10.1142/S0218213004001922

  43. [43]

    Ken Satoh. 2023. PROLEG: Practical Legal Reasoning System. InProlog: The Next 50 Years. Number 13900 in Lecture Notes in Computer Science. Springer, 277–283. doi:10.1007/978-3-031-35254-6_23

  44. [44]

    M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and H. T. Cory

  45. [45]

    ACM29, 5 (May 1986), 370–386

    The British Nationality Act as a Logic Program.Commun. ACM29, 5 (May 1986), 370–386. doi:10.1145/5689.5920

  46. [46]

    Shortliffe

    Edward H. Shortliffe. 1977. Mycin: A Knowledge-Based Computer Program Applied to Infectious Diseases.Proceedings of the Annual Symposium on Computer Application in Medical Care(Oct. 1977), 66–69. https://pmc.ncbi.nlm.nih.gov/ articles/PMC2464549/

  47. [47]

    David Silver et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search.Nature529, 7587 (Jan. 2016), 484–489. doi:10.1038/nature16961

  48. [48]

    Aaditya Singh et al. 2025. OpenAI GPT-5 System Card. arXiv:2601.03267 [cs] doi:10.48550/arXiv.2601.03267

  49. [49]

    Detlev Sternberg-Lieben et al. 2025. § 130 StGB - Volksverhetzung.Tübinger Kommentar Strafgesetzbuch(2025), 3555

  50. [50]

    Dirk Streeb, Yannick Metz, Udo Schlegel, Bruno Schneider, Mennatallah El- Assady, Hansjörg Neth, Min Chen, and Daniel A. Keim. 2022. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers.IEEE Trans- actions on Visualization and Computer Graphics28, 9 (Sept. 2022), 3307–3323. doi:10.1109/TVCG.2020.3045560

  51. [51]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InProceedings of the 31st International Conference on Neural Informa- tion Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349

  52. [52]

    Bart Verheij. 2003. DefLog: On the Logical Interpretation of Prima Facie Justified Assumptions.Journal of Logic and Computation13, 3 (June 2003), 319–346. doi:10.1093/logcom/13.3.319

  53. [53]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain- of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, 24824–24837. ...

  54. [54]

    Siwei Wu et al. 2024. A Comparative Study on Reasoning Patterns of OpenAI’s O1 Model. arXiv:2410.13639 [cs] doi:10.48550/arXiv.2410.13639

  55. [55]

    Kepu Zhang, Weijie Yu, Zhongxiang Sun, and Jun Xu. 2025. SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models. InProceed- ings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25). Association for Computing Machinery, New York, NY, USA, 4117–4127. doi:10.1145/3746252.3761120

  56. [56]

    Frederike Zufall, Marius Hamacher, Katharina Kloppenborg, and Torsten Zesch

  57. [57]

    In: Zong, C., Xia, F., Li, W., Navigli, R

    A Legal Approach to Hate Speech – Operationalizing the EU’s Legal Framework against the Expression of Hatred as an NLP Task. InProceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 53–64. doi:10.18653/v1/ 2022.nllp-1.5