Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech

Oskar von Cossel

arxiv: 2605.16280 · v1 · pith:OCNVJMTAnew · submitted 2026-04-10 · 💻 cs.CY · cs.AI

Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech

Oskar von Cossel This is my paper

Pith reviewed 2026-05-21 09:16 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords neuro-symboliclegal automationhate speechcontent moderationLLMsymbolic scaffoldsRulemappingGerman Criminal Code

0 comments

The pith

Expert symbolic scaffolds constrain LLMs to achieve 0.80-0.86 precision in hate speech classification under German law.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates a neuro-symbolic hybrid to overcome the trade-off between transparency in symbolic systems and flexibility in neural ones for legal reasoning. It applies this to online hate speech moderation as a stand-in for high-volume legal decisions. By using Rulemapping's expert-authored logic trees to guide LLMs, the method prevents models from equating moral offensiveness with legal illegality. This results in significantly higher precision than plain prompting while keeping recall high. A reader would care because it points toward auditable AI systems that can handle regulatory demands in content moderation.

Core claim

Rulemapping operationalizes the legal syllogism with visual logic trees that serve as deterministic symbolic scaffolds for LLMs. When applied to assessing illegality under § 130(1) of the German Criminal Code, this hybrid approach raises precision to 0.80-0.86 from 0.34-0.49 for unconstrained prompting, while recall remains at 0.82-0.89 across various LLMs. The paper concludes that such expert-authored symbolic scaffolds enable robust legal automation aligned with requirements for auditability and verifiable decision-making.

What carries the argument

Rulemapping, a visual logic-tree method operationalising the classic legal syllogism to provide deterministic symbolic scaffolds for LLMs

If this is right

Maintains high recall so that illegal content is not missed.
Reduces false positives by aligning decisions strictly with the legal standard.
Provides transparency and auditability through the explicit logic trees.
Works consistently across different underlying LLMs.
Supports compliance with regulations requiring verifiable decision processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could extend to automating other areas of administrative law with similar high-volume decisions.
Over time, the symbolic scaffolds would need updating as legal interpretations evolve through court rulings.
Hybrid systems like this might offer a template for balancing AI capabilities with human oversight in regulated fields.

Load-bearing premise

The expert-authored symbolic logic trees correctly and exhaustively capture the legal standard in §130(1) without their own interpretive bias or incompleteness affecting classification outcomes.

What would settle it

Running the system on a new set of cases featuring legal edge cases or interpretations not fully represented in the logic trees would reveal if the precision advantage over unconstrained LLMs disappears.

Figures

Figures reproduced from arXiv: 2605.16280 by Oskar von Cossel.

read the original abstract

Automating legal reasoning forces a choice between imperfect alternatives: symbolic systems offer transparency but struggle with ambiguity, whereas neural systems handle natural language flexibly but lack verifiability. This paper investigates whether a hybrid, neuro-symbolic approach can reconcile this trade-off. We evaluate this architecture in the domain of online content moderation, which serves as a proxy for high-volume legal decision-making such as mass administrative proceedings. In these settings, operators must assess thousands of cases daily under strict legal standards. Specifically, we examine whether constraining large language models (LLMs) within deterministic symbolic scaffolds improves statute-grounded illegality assessment while preventing "scope drift" (where LLMs conflate moral offensiveness with legal illegality). We evaluate the neuro-symbolic variant of Rulemapping - a visual logic-tree method that operationalises the classic legal syllogism - on online hate-speech classification under \S 130(1) of the German Criminal Code. Across diverse LLMs, Rulemapping maintains high recall (0.82-0.89) while achieving precision of 0.80-0.86, compared to 0.34-0.49 for unconstrained prompting. Expert-authored symbolic scaffolds thus enable robust legal automation aligned with regulatory requirements for auditability and verifiable decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that constraining LLMs within expert-authored deterministic symbolic scaffolds (via the Rulemapping visual logic-tree method operationalizing the legal syllogism) reconciles the trade-off between symbolic transparency and neural flexibility for statute-grounded assessment of online hate speech under §130(1) of the German Criminal Code. It reports that this neuro-symbolic variant maintains high recall (0.82-0.89) while raising precision to 0.80-0.86 across diverse LLMs, versus 0.34-0.49 for unconstrained prompting, thereby enabling auditable and verifiable legal automation aligned with regulatory requirements.

Significance. If the central empirical result holds and the scaffolds accurately instantiate the statutory elements without interpretive narrowing, the work would demonstrate a practical hybrid architecture for high-volume legal decision-making. It would provide concrete evidence that expert symbolic constraints can reduce scope drift in LLMs while preserving verifiability, directly addressing auditability demands in automated content moderation and mass administrative proceedings.

major comments (2)

The attribution of the precision lift (0.80-0.86 vs. 0.34-0.49) to improved legal fidelity rests on the assumption that the expert-authored logic trees correctly and exhaustively capture §130(1) without adding extra conjuncts or narrowing 'incitement' or 'hatred' beyond judicial practice. No section describes elicitation of the trees, cross-checking against BGH decisions or commentaries, or convergence across multiple legal experts; if the scaffolds are stricter than the actual standard, the gain is an artifact of reduced positive rate rather than better alignment.
Evaluation section: the reported numeric gains lack supporting details on dataset construction (case selection and source), inter-annotator agreement for ground-truth labels, exact baseline prompting strategies, and statistical significance tests. These omissions make it impossible to rule out post-hoc scaffold selection or to assess whether the precision-recall trade-off generalizes beyond the evaluated instances.

minor comments (2)

Abstract: specify the exact LLMs tested and the number of runs or seeds used to generate the 0.80-0.86 and 0.82-0.89 ranges.
Results tables: add confidence intervals or p-values for the precision and recall comparisons to strengthen the cross-method claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which help clarify key aspects of our neuro-symbolic approach. We address each major comment below and commit to revisions that strengthen the manuscript's transparency and rigor without altering its core claims.

read point-by-point responses

Referee: The attribution of the precision lift (0.80-0.86 vs. 0.34-0.49) to improved legal fidelity rests on the assumption that the expert-authored logic trees correctly and exhaustively capture §130(1) without adding extra conjuncts or narrowing 'incitement' or 'hatred' beyond judicial practice. No section describes elicitation of the trees, cross-checking against BGH decisions or commentaries, or convergence across multiple legal experts; if the scaffolds are stricter than the actual standard, the gain is an artifact of reduced positive rate rather than better alignment.

Authors: We appreciate the referee's emphasis on validating scaffold fidelity to the statutory standard. The logic trees were constructed by mapping the exact statutory elements of §130(1) (incitement to hatred or violence) onto the legal syllogism using standard doctrinal sources, without additional conjuncts. To address the gap, we will add a dedicated subsection to the Methods detailing the elicitation process, including direct references to relevant BGH decisions on the scope of 'incitement' and 'hatred' as well as alignment with leading commentaries. This documentation will confirm no interpretive narrowing occurred. The maintained high recall range (0.82-0.89) across models further indicates the precision gains are not an artifact of a reduced positive rate, as stricter scaffolds would typically depress recall. We disagree that the gain must be artifactual and will use the revision to make the alignment explicit. revision: yes
Referee: Evaluation section: the reported numeric gains lack supporting details on dataset construction (case selection and source), inter-annotator agreement for ground-truth labels, exact baseline prompting strategies, and statistical significance tests. These omissions make it impossible to rule out post-hoc scaffold selection or to assess whether the precision-recall trade-off generalizes beyond the evaluated instances.

Authors: We agree that fuller methodological transparency is required. The revised Evaluation section will include: explicit case selection criteria and data sources for the §130(1) instances; inter-annotator agreement statistics (Cohen's kappa) for the expert-provided ground-truth labels; precise specifications of the unconstrained baseline prompting strategies (zero-shot and few-shot variants); and statistical significance testing (e.g., McNemar's test) on the observed precision differences. These additions will allow readers to evaluate generalizability and exclude post-hoc selection concerns. The underlying data and annotation protocol are available for inclusion. revision: yes

Circularity Check

0 steps flagged

Empirical performance comparison with no derivation reducing to inputs by construction

full rationale

The paper reports measured precision and recall on a classification task using expert-authored logic trees as fixed scaffolds. Performance numbers are obtained by direct comparison to external labels rather than by algebraic rearrangement or parameter fitting that forces the reported outcome. No equations appear in the abstract, and the central claim is an observed lift under the stated experimental protocol. The method is presented as an empirical case study rather than a first-principles derivation, so no self-definitional, fitted-input, or self-citation load-bearing circularity is present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that expert-crafted symbolic trees faithfully encode the legal rule and that the evaluation set is representative; no free parameters or invented physical entities are described.

axioms (1)

domain assumption Expert-authored symbolic scaffolds accurately represent the statutory standard without interpretive bias or incompleteness.
Invoked when claiming that the constrained LLM produces statute-grounded decisions.

pith-pipeline@v0.9.0 · 5754 in / 1285 out tokens · 40651 ms · 2026-05-21T09:16:50.301506+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and embed theorems echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Rulemapping formalises legal reasoning through structured logical decomposition... rooted trees of logical nodes... AND (∧), OR (∨), XOR (⊻)... bottom-up from the leaves to the root.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Expert-authored symbolic scaffolds thus enable robust legal automation aligned with regulatory requirements for auditability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 6 internal anchors

[1]

Stephan Anstötz. 2025. § 130 StGB - Volksverhetzung.Münchener Kommentar zum StGB(2025)

work page 2025
[2]

Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges.ACM Comput. Surv.58, 6 (Dec. 2025), 163:1–163:37. doi:10.1145/ 3777009

work page 2025
[3]

Trevor Bench-Capon and M. Sergot. 1988. Towards a Rule-Based Representation of Open Texture in Law.Computing Power and Legal Language(1988), 39–60

work page 1988
[4]

Michael James Bommarito, Jillian Bommarito, and Daniel Martin Katz. 2025. The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models. social science research network:5211933 doi:10.2139/ssrn.5211933

work page doi:10.2139/ssrn.5211933 2025
[5]

2021.Rechtshandbuch Legal Tech(2 ed.)

Stephan Breidenbach, Florian Glatz, Tom Braegelmann, Philipp Caba, and Alexan- dra Dietzen. 2021.Rechtshandbuch Legal Tech(2 ed.). C.H. Beck, München

work page 2021
[6]

Markus Conrads and Sascha Schweitzer. 2025. Juristische Problemlösung Mit KI – Leistung Und Grenzen Großer Sprachmodelle.Neue Juristische Wochenschrift 40 (2025), 2888–2891. https://beck-online.beck.de/Bcid/Y-300-Z-NJW-B-2025- S-2888-N-1

work page 2025
[7]

Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E Ho. 2024. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models.Journal of Legal Analysis16, 1 (Jan. 2024), 64–93. doi:10.1093/jla/laae003

work page doi:10.1093/jla/laae003 2024
[8]

DeepSeek-AI et al. 2025. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs] doi:10.48550/arXiv.2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2025
[9]

Christoph Demus et al . 2022. DeTox: A Comprehensive Dataset for Ger- man Offensive Language and Conversation Analysis. InProceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, and Zeerak Talat (Eds.). Association for Computational Linguistics, Seattle, Washington (Hybrid), 143–

work page 2022
[10]

doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022
[11]

Christian Djeffal. 2014. A Commentary on Commentaries: The Wissenschaftsrat on Legal Commentaries and Beyond.Verfassungsblog(June 2014). doi:10.17176/ 20181005-163652-0

work page 2014
[12]

Niklas Eder. 2024. Making Systemic Risk Assessments Work: How the DSA Creates a Virtuous Loop to Address the Societal Harms of Content Moderation. German Law Journal25, 7 (Oct. 2024), 1197–1218. doi:10.1017/glj.2024.24

work page doi:10.1017/glj.2024.24 2024
[13]

Darren Edge et al. 2025. From Local to Global: A Graph RAG Approach to Query- Focused Summarization. arXiv:2404.16130 [cs] doi:10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2025
[14]

Yu Fan et al. 2025. LEXam: Benchmarking Legal Reasoning on 340 Law Exams. arXiv:2505.12864 [cs] doi:10.48550/arXiv.2505.12864

work page doi:10.48550/arxiv.2505.12864 2025
[15]

Keim, and Maximilian T

Daniel Fürst, Mennatallah El-Assady, Daniel A. Keim, and Maximilian T. Fis- cher. 2025. Challenges and Opportunities for Visual Analytics in Jurisprudence. Artificial Intelligence and Law(Nov. 2025). doi:10.1007/s10506-025-09494-2

work page doi:10.1007/s10506-025-09494-2 2025
[16]

Clement Guitton, Aurelia Tamò-Larrieux, Simon Mayer, and Gijs van Dijck. 2025. The Challenge of Open-Texture in Law.Artificial Intelligence and Law33, 2 (June 2025), 405–435. doi:10.1007/s10506-024-09390-1

work page doi:10.1007/s10506-024-09390-1 2025
[17]

Keyan Guo, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vish- wamitra, and Hongxin Hu. 2023. An Investigation of Large Language Mod- els for Real-World Hate Speech Detection. In2023 International Conference on Machine Learning and Applications (ICMLA). IEEE, 1568–1573. https: //ieeexplore.ieee.org/abstract/document/10459901/

work page arXiv 2023
[18]

Hanjo Hamann. 2021. Transparenz der Justiz: Stagnation seit 50 Jahren.Legal Tribune Online(July 2021). https://www.lto.de/persistent/a_id/45370

work page 2021
[19]

H. L. A. Hart. 1961. The Concept of Law. InThe Concept of Law. Oxford University Press. doi:10.2307/2217213

work page doi:10.2307/2217213 1961
[20]

2011.Thinking, Fast and Slow

Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux

work page 2011
[21]

Manuj Kant, Manav Kant, Marzieh Nabi, Preston Carlson, and Megan Ma. 2024. Equitable Access to Justice: Logical LLMs Show Promise. arXiv:2410.09904 [cs] doi:10.48550/arXiv.2410.09904

work page doi:10.48550/arxiv.2410.09904 2024
[22]

Manuj Kant, Sareh Nabi, Manav Kant, Roland Scharrer, Megan Ma, and Marzieh Nabi. 2025. Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law. arXiv:2502.17638 [cs] doi:10.48550/arXiv.2502.17638

work page doi:10.48550/arxiv.2502.17638 2025
[23]

Henry Kautz. 2022. The Third AI Summer: AAAI Robert S. Engelmore Memorial Lecture.AI Magazine43, 1 (March 2022), 105–125. doi:10.1002/aaai.12036

work page doi:10.1002/aaai.12036 2022
[24]

1995.Methodenlehre Der Rechtswis- senschaft

Karl Larenz and Claus-Wilhelm Canaris. 1995.Methodenlehre Der Rechtswis- senschaft. Springer, Berlin, Heidelberg. doi:10.1007/978-3-662-08709-1 Oskar von Cossel

work page doi:10.1007/978-3-662-08709-1 1995
[25]

Florian Ludwig, Torsten Zesch, and Frederike Zufall. 2025. Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech. In Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, Christian Wartena and Ulrich Heid (Eds.). HsH Applied Academics, Hannover, Germany, 154–167. https://ac...

work page 2025
[26]

Magesh, F

Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Man- ning, and Daniel E. Ho. 2025. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.Journal of Empirical Legal Studies22, 2 (2025), 216–242. doi:10.1111/jels.12413

work page doi:10.1111/jels.12413 2025
[27]

Scott McLachlan, Evangelia Kyrimi, Kudakwashe Dube, Norman Fenton, and Lisa C. Webley. 2022. Lawmaps: Enabling Legal AI Development through Visual- isation of the Implicit Structure of Legislation and Lawyerly Process.Artificial Intelligence and Law31, 1 (March 2022), 169–194. doi:10.1007/s10506-021-09298-0

work page doi:10.1007/s10506-021-09298-0 2022
[28]

Scott McLachlan and Lisa C Webley. 2021. Visualisation of Law and Legal Process: An Opportunity Missed.Information Visualization20, 2-3 (July 2021), 192–204. doi:10.1177/14738716211012608

work page doi:10.1177/14738716211012608 2021
[29]

Masha Medvedeva and Pauline Mcbride. 2023. Legal Judgment Prediction: If You Are Going to Do It, Do It Right. InProceedings of the Natural Legal Language Processing Workshop 2023, Daniel Preot,iuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras (Eds.). Association for Computational Linguistics, Singapore,...

work page doi:10.18653/v1/2023.nllp-1.9 2023
[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean

work page
[31]

InAdvances in Neural Information Processing Systems, Vol

Distributed Representations of Words and Phrases and Their Com- positionality. InAdvances in Neural Information Processing Systems, Vol. 26. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2013/hash/ 9aa42b31882ec039965f3c4923ce901b-Abstract.html

work page 2013
[32]

Mistral AI Team. 2024. Pixtral Large and Mistral Large 2.1. https://mistral.ai/ news/pixtral-large

work page 2024
[33]

Wolfgang Mitsch. 2011. Volksverhetzung gegen Deutsche.Juristische Rundschau 2011, 9 (2011), 380–382. doi:10.1515/juru.2011.380

work page doi:10.1515/juru.2011.380 2011
[34]

Robert N Moles. 1991. Logic Programming - An Assessment Of Its Potential For Artificial Intelligence Applications In Law.Journal of Law and Information Science Vol 2 No. 2 (1991), 137–164. https://heinonline.org/HOL/P?h=hein.journals/ jlinfos2&i=145

work page 1991
[35]

Jack Mumford, Katie Atkinson, and Trevor Bench-Capon. 2022. Reasoning with Legal Cases: A Hybrid ADF-ML Approach. InLegal Knowledge and Information Systems. IOS Press, 93–102. doi:10.3233/FAIA220452

work page doi:10.3233/faia220452 2022
[36]

Ha-Thanh Nguyen, Vu Tran, Ngoc-Cam Le, Thi-Thuy Le, Quang-Huy Nguyen, Le-Minh Nguyen, and Ken Satoh. 2022. Law to Binary Tree – An Formal Inter- pretation of Legal Natural Language. InProceedings of the International Workshop on Methodologies for Translating Legal Norms into Formal Representations. arXiv, Saarbrücken. arXiv:2212.08335 [cs] doi:10.48550/ar...

work page doi:10.48550/arxiv.2212.08335 2022
[37]

OpenAI et al. 2024. GPT-4o System Card. arXiv:2410.21276 [cs] doi:10.48550/ arXiv.2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

OpenAI et al. 2024. OpenAI O1 System Card. arXiv:2412.16720 [cs] doi:10.48550/ arXiv.2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

OpenAI et al . 2025. Gpt-Oss-120b & Gpt-Oss-20b Model Card. arXiv:2508.10925 [cs] doi:10.48550/arXiv.2508.10925

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925 2025
[40]

Cecilia Panigutti et al. 2023. The Role of Explainable AI in the Context of the AI Act. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1139–1150. doi:10.1145/3593013.3594069

work page doi:10.1145/3593013.3594069 2023
[41]

Dasha Pruss and Jessie Allen. 2025. Against AI Jurisprudence: Large Lan- guage Models and the False Promises of Empirical Judging.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 3 (Oct. 2025), 2055–2066. doi:10.1609/aies.v8i3.36695

work page doi:10.1609/aies.v8i3.36695 2025
[42]

Chris Reed and Glenn Rowe. 2004. Araucaria: Software for Argument Analysis, Diagramming and Representation.International Journal on Artificial Intelligence Tools13, 04 (Dec. 2004), 961–979. doi:10.1142/S0218213004001922

work page doi:10.1142/s0218213004001922 2004
[43]

Ken Satoh. 2023. PROLEG: Practical Legal Reasoning System. InProlog: The Next 50 Years. Number 13900 in Lecture Notes in Computer Science. Springer, 277–283. doi:10.1007/978-3-031-35254-6_23

work page doi:10.1007/978-3-031-35254-6_23 2023
[44]

M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and H. T. Cory

work page
[45]

ACM29, 5 (May 1986), 370–386

The British Nationality Act as a Logic Program.Commun. ACM29, 5 (May 1986), 370–386. doi:10.1145/5689.5920

work page doi:10.1145/5689.5920 1986
[46]

Shortliffe

Edward H. Shortliffe. 1977. Mycin: A Knowledge-Based Computer Program Applied to Infectious Diseases.Proceedings of the Annual Symposium on Computer Application in Medical Care(Oct. 1977), 66–69. https://pmc.ncbi.nlm.nih.gov/ articles/PMC2464549/

work page 1977
[47]

David Silver et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search.Nature529, 7587 (Jan. 2016), 484–489. doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016
[48]

Aaditya Singh et al. 2025. OpenAI GPT-5 System Card. arXiv:2601.03267 [cs] doi:10.48550/arXiv.2601.03267

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.03267 2025
[49]

Detlev Sternberg-Lieben et al. 2025. § 130 StGB - Volksverhetzung.Tübinger Kommentar Strafgesetzbuch(2025), 3555

work page 2025
[50]

Dirk Streeb, Yannick Metz, Udo Schlegel, Bruno Schneider, Mennatallah El- Assady, Hansjörg Neth, Min Chen, and Daniel A. Keim. 2022. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers.IEEE Trans- actions on Visualization and Computer Graphics28, 9 (Sept. 2022), 3307–3323. doi:10.1109/TVCG.2020.3045560

work page doi:10.1109/tvcg.2020.3045560 2022
[51]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InProceedings of the 31st International Conference on Neural Informa- tion Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349

work page doi:10.5555/3295222.3295349 2017
[52]

Bart Verheij. 2003. DefLog: On the Logical Interpretation of Prima Facie Justified Assumptions.Journal of Logic and Computation13, 3 (June 2003), 319–346. doi:10.1093/logcom/13.3.319

work page doi:10.1093/logcom/13.3.319 2003
[53]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain- of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, 24824–24837. ...

work page 2022
[54]

Siwei Wu et al. 2024. A Comparative Study on Reasoning Patterns of OpenAI’s O1 Model. arXiv:2410.13639 [cs] doi:10.48550/arXiv.2410.13639

work page doi:10.48550/arxiv.2410.13639 2024
[55]

Kepu Zhang, Weijie Yu, Zhongxiang Sun, and Jun Xu. 2025. SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models. InProceed- ings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25). Association for Computing Machinery, New York, NY, USA, 4117–4127. doi:10.1145/3746252.3761120

work page doi:10.1145/3746252.3761120 2025
[56]

Frederike Zufall, Marius Hamacher, Katharina Kloppenborg, and Torsten Zesch

work page
[57]

In: Zong, C., Xia, F., Li, W., Navigli, R

A Legal Approach to Hate Speech – Operationalizing the EU’s Legal Framework against the Expression of Hatred as an NLP Task. InProceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 53–64. doi:10.18653/v1/ 2022.nllp-1.5

work page doi:10.18653/v1/ 2022

[1] [1]

Stephan Anstötz. 2025. § 130 StGB - Volksverhetzung.Münchener Kommentar zum StGB(2025)

work page 2025

[2] [2]

Farid Ariai, Joel Mackenzie, and Gianluca Demartini. 2025. Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges.ACM Comput. Surv.58, 6 (Dec. 2025), 163:1–163:37. doi:10.1145/ 3777009

work page 2025

[3] [3]

Trevor Bench-Capon and M. Sergot. 1988. Towards a Rule-Based Representation of Open Texture in Law.Computing Power and Legal Language(1988), 39–60

work page 1988

[4] [4]

Michael James Bommarito, Jillian Bommarito, and Daniel Martin Katz. 2025. The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models. social science research network:5211933 doi:10.2139/ssrn.5211933

work page doi:10.2139/ssrn.5211933 2025

[5] [5]

2021.Rechtshandbuch Legal Tech(2 ed.)

Stephan Breidenbach, Florian Glatz, Tom Braegelmann, Philipp Caba, and Alexan- dra Dietzen. 2021.Rechtshandbuch Legal Tech(2 ed.). C.H. Beck, München

work page 2021

[6] [6]

Markus Conrads and Sascha Schweitzer. 2025. Juristische Problemlösung Mit KI – Leistung Und Grenzen Großer Sprachmodelle.Neue Juristische Wochenschrift 40 (2025), 2888–2891. https://beck-online.beck.de/Bcid/Y-300-Z-NJW-B-2025- S-2888-N-1

work page 2025

[7] [7]

Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E Ho. 2024. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models.Journal of Legal Analysis16, 1 (Jan. 2024), 64–93. doi:10.1093/jla/laae003

work page doi:10.1093/jla/laae003 2024

[8] [8]

DeepSeek-AI et al. 2025. DeepSeek-V3 Technical Report. arXiv:2412.19437 [cs] doi:10.48550/arXiv.2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2025

[9] [9]

Christoph Demus et al . 2022. DeTox: A Comprehensive Dataset for Ger- man Offensive Language and Conversation Analysis. InProceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, and Zeerak Talat (Eds.). Association for Computational Linguistics, Seattle, Washington (Hybrid), 143–

work page 2022

[10] [10]

doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022

[11] [11]

Christian Djeffal. 2014. A Commentary on Commentaries: The Wissenschaftsrat on Legal Commentaries and Beyond.Verfassungsblog(June 2014). doi:10.17176/ 20181005-163652-0

work page 2014

[12] [12]

Niklas Eder. 2024. Making Systemic Risk Assessments Work: How the DSA Creates a Virtuous Loop to Address the Societal Harms of Content Moderation. German Law Journal25, 7 (Oct. 2024), 1197–1218. doi:10.1017/glj.2024.24

work page doi:10.1017/glj.2024.24 2024

[13] [13]

Darren Edge et al. 2025. From Local to Global: A Graph RAG Approach to Query- Focused Summarization. arXiv:2404.16130 [cs] doi:10.48550/arXiv.2404.16130

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2025

[14] [14]

Yu Fan et al. 2025. LEXam: Benchmarking Legal Reasoning on 340 Law Exams. arXiv:2505.12864 [cs] doi:10.48550/arXiv.2505.12864

work page doi:10.48550/arxiv.2505.12864 2025

[15] [15]

Keim, and Maximilian T

Daniel Fürst, Mennatallah El-Assady, Daniel A. Keim, and Maximilian T. Fis- cher. 2025. Challenges and Opportunities for Visual Analytics in Jurisprudence. Artificial Intelligence and Law(Nov. 2025). doi:10.1007/s10506-025-09494-2

work page doi:10.1007/s10506-025-09494-2 2025

[16] [16]

Clement Guitton, Aurelia Tamò-Larrieux, Simon Mayer, and Gijs van Dijck. 2025. The Challenge of Open-Texture in Law.Artificial Intelligence and Law33, 2 (June 2025), 405–435. doi:10.1007/s10506-024-09390-1

work page doi:10.1007/s10506-024-09390-1 2025

[17] [17]

Keyan Guo, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vish- wamitra, and Hongxin Hu. 2023. An Investigation of Large Language Mod- els for Real-World Hate Speech Detection. In2023 International Conference on Machine Learning and Applications (ICMLA). IEEE, 1568–1573. https: //ieeexplore.ieee.org/abstract/document/10459901/

work page arXiv 2023

[18] [18]

Hanjo Hamann. 2021. Transparenz der Justiz: Stagnation seit 50 Jahren.Legal Tribune Online(July 2021). https://www.lto.de/persistent/a_id/45370

work page 2021

[19] [19]

H. L. A. Hart. 1961. The Concept of Law. InThe Concept of Law. Oxford University Press. doi:10.2307/2217213

work page doi:10.2307/2217213 1961

[20] [20]

2011.Thinking, Fast and Slow

Daniel Kahneman. 2011.Thinking, Fast and Slow. Farrar, Straus and Giroux

work page 2011

[21] [21]

Manuj Kant, Manav Kant, Marzieh Nabi, Preston Carlson, and Megan Ma. 2024. Equitable Access to Justice: Logical LLMs Show Promise. arXiv:2410.09904 [cs] doi:10.48550/arXiv.2410.09904

work page doi:10.48550/arxiv.2410.09904 2024

[22] [22]

Manuj Kant, Sareh Nabi, Manav Kant, Roland Scharrer, Megan Ma, and Marzieh Nabi. 2025. Towards Robust Legal Reasoning: Harnessing Logical LLMs in Law. arXiv:2502.17638 [cs] doi:10.48550/arXiv.2502.17638

work page doi:10.48550/arxiv.2502.17638 2025

[23] [23]

Henry Kautz. 2022. The Third AI Summer: AAAI Robert S. Engelmore Memorial Lecture.AI Magazine43, 1 (March 2022), 105–125. doi:10.1002/aaai.12036

work page doi:10.1002/aaai.12036 2022

[24] [24]

1995.Methodenlehre Der Rechtswis- senschaft

Karl Larenz and Claus-Wilhelm Canaris. 1995.Methodenlehre Der Rechtswis- senschaft. Springer, Berlin, Heidelberg. doi:10.1007/978-3-662-08709-1 Oskar von Cossel

work page doi:10.1007/978-3-662-08709-1 1995

[25] [25]

Florian Ludwig, Torsten Zesch, and Frederike Zufall. 2025. Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech. In Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, Christian Wartena and Ulrich Heid (Eds.). HsH Applied Academics, Hannover, Germany, 154–167. https://ac...

work page 2025

[26] [26]

Magesh, F

Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Man- ning, and Daniel E. Ho. 2025. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools.Journal of Empirical Legal Studies22, 2 (2025), 216–242. doi:10.1111/jels.12413

work page doi:10.1111/jels.12413 2025

[27] [27]

Scott McLachlan, Evangelia Kyrimi, Kudakwashe Dube, Norman Fenton, and Lisa C. Webley. 2022. Lawmaps: Enabling Legal AI Development through Visual- isation of the Implicit Structure of Legislation and Lawyerly Process.Artificial Intelligence and Law31, 1 (March 2022), 169–194. doi:10.1007/s10506-021-09298-0

work page doi:10.1007/s10506-021-09298-0 2022

[28] [28]

Scott McLachlan and Lisa C Webley. 2021. Visualisation of Law and Legal Process: An Opportunity Missed.Information Visualization20, 2-3 (July 2021), 192–204. doi:10.1177/14738716211012608

work page doi:10.1177/14738716211012608 2021

[29] [29]

Masha Medvedeva and Pauline Mcbride. 2023. Legal Judgment Prediction: If You Are Going to Do It, Do It Right. InProceedings of the Natural Legal Language Processing Workshop 2023, Daniel Preot,iuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, and Nikolaos Aletras (Eds.). Association for Computational Linguistics, Singapore,...

work page doi:10.18653/v1/2023.nllp-1.9 2023

[30] [30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean

work page

[31] [31]

InAdvances in Neural Information Processing Systems, Vol

Distributed Representations of Words and Phrases and Their Com- positionality. InAdvances in Neural Information Processing Systems, Vol. 26. Curran Associates, Inc. https://papers.nips.cc/paper_files/paper/2013/hash/ 9aa42b31882ec039965f3c4923ce901b-Abstract.html

work page 2013

[32] [32]

Mistral AI Team. 2024. Pixtral Large and Mistral Large 2.1. https://mistral.ai/ news/pixtral-large

work page 2024

[33] [33]

Wolfgang Mitsch. 2011. Volksverhetzung gegen Deutsche.Juristische Rundschau 2011, 9 (2011), 380–382. doi:10.1515/juru.2011.380

work page doi:10.1515/juru.2011.380 2011

[34] [34]

Robert N Moles. 1991. Logic Programming - An Assessment Of Its Potential For Artificial Intelligence Applications In Law.Journal of Law and Information Science Vol 2 No. 2 (1991), 137–164. https://heinonline.org/HOL/P?h=hein.journals/ jlinfos2&i=145

work page 1991

[35] [35]

Jack Mumford, Katie Atkinson, and Trevor Bench-Capon. 2022. Reasoning with Legal Cases: A Hybrid ADF-ML Approach. InLegal Knowledge and Information Systems. IOS Press, 93–102. doi:10.3233/FAIA220452

work page doi:10.3233/faia220452 2022

[36] [36]

Ha-Thanh Nguyen, Vu Tran, Ngoc-Cam Le, Thi-Thuy Le, Quang-Huy Nguyen, Le-Minh Nguyen, and Ken Satoh. 2022. Law to Binary Tree – An Formal Inter- pretation of Legal Natural Language. InProceedings of the International Workshop on Methodologies for Translating Legal Norms into Formal Representations. arXiv, Saarbrücken. arXiv:2212.08335 [cs] doi:10.48550/ar...

work page doi:10.48550/arxiv.2212.08335 2022

[37] [37]

OpenAI et al. 2024. GPT-4o System Card. arXiv:2410.21276 [cs] doi:10.48550/ arXiv.2410.21276

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

OpenAI et al. 2024. OpenAI O1 System Card. arXiv:2412.16720 [cs] doi:10.48550/ arXiv.2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

OpenAI et al . 2025. Gpt-Oss-120b & Gpt-Oss-20b Model Card. arXiv:2508.10925 [cs] doi:10.48550/arXiv.2508.10925

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925 2025

[40] [40]

Cecilia Panigutti et al. 2023. The Role of Explainable AI in the Context of the AI Act. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1139–1150. doi:10.1145/3593013.3594069

work page doi:10.1145/3593013.3594069 2023

[41] [41]

Dasha Pruss and Jessie Allen. 2025. Against AI Jurisprudence: Large Lan- guage Models and the False Promises of Empirical Judging.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society8, 3 (Oct. 2025), 2055–2066. doi:10.1609/aies.v8i3.36695

work page doi:10.1609/aies.v8i3.36695 2025

[42] [42]

Chris Reed and Glenn Rowe. 2004. Araucaria: Software for Argument Analysis, Diagramming and Representation.International Journal on Artificial Intelligence Tools13, 04 (Dec. 2004), 961–979. doi:10.1142/S0218213004001922

work page doi:10.1142/s0218213004001922 2004

[43] [43]

Ken Satoh. 2023. PROLEG: Practical Legal Reasoning System. InProlog: The Next 50 Years. Number 13900 in Lecture Notes in Computer Science. Springer, 277–283. doi:10.1007/978-3-031-35254-6_23

work page doi:10.1007/978-3-031-35254-6_23 2023

[44] [44]

M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and H. T. Cory

work page

[45] [45]

ACM29, 5 (May 1986), 370–386

The British Nationality Act as a Logic Program.Commun. ACM29, 5 (May 1986), 370–386. doi:10.1145/5689.5920

work page doi:10.1145/5689.5920 1986

[46] [46]

Shortliffe

Edward H. Shortliffe. 1977. Mycin: A Knowledge-Based Computer Program Applied to Infectious Diseases.Proceedings of the Annual Symposium on Computer Application in Medical Care(Oct. 1977), 66–69. https://pmc.ncbi.nlm.nih.gov/ articles/PMC2464549/

work page 1977

[47] [47]

David Silver et al. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search.Nature529, 7587 (Jan. 2016), 484–489. doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016

[48] [48]

Aaditya Singh et al. 2025. OpenAI GPT-5 System Card. arXiv:2601.03267 [cs] doi:10.48550/arXiv.2601.03267

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.03267 2025

[49] [49]

Detlev Sternberg-Lieben et al. 2025. § 130 StGB - Volksverhetzung.Tübinger Kommentar Strafgesetzbuch(2025), 3555

work page 2025

[50] [50]

Dirk Streeb, Yannick Metz, Udo Schlegel, Bruno Schneider, Mennatallah El- Assady, Hansjörg Neth, Min Chen, and Daniel A. Keim. 2022. Task-Based Visual Interactive Modeling: Decision Trees and Rule-Based Classifiers.IEEE Trans- actions on Visualization and Computer Graphics28, 9 (Sept. 2022), 3307–3323. doi:10.1109/TVCG.2020.3045560

work page doi:10.1109/tvcg.2020.3045560 2022

[51] [51]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. InProceedings of the 31st International Conference on Neural Informa- tion Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. https://dl.acm.org/doi/10.5555/3295222.3295349

work page doi:10.5555/3295222.3295349 2017

[52] [52]

Bart Verheij. 2003. DefLog: On the Logical Interpretation of Prima Facie Justified Assumptions.Journal of Logic and Computation13, 3 (June 2003), 319–346. doi:10.1093/logcom/13.3.319

work page doi:10.1093/logcom/13.3.319 2003

[53] [53]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain- of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA, 24824–24837. ...

work page 2022

[54] [54]

Siwei Wu et al. 2024. A Comparative Study on Reasoning Patterns of OpenAI’s O1 Model. arXiv:2410.13639 [cs] doi:10.48550/arXiv.2410.13639

work page doi:10.48550/arxiv.2410.13639 2024

[55] [55]

Kepu Zhang, Weijie Yu, Zhongxiang Sun, and Jun Xu. 2025. SyLeR: A Framework for Explicit Syllogistic Legal Reasoning in Large Language Models. InProceed- ings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25). Association for Computing Machinery, New York, NY, USA, 4117–4127. doi:10.1145/3746252.3761120

work page doi:10.1145/3746252.3761120 2025

[56] [56]

Frederike Zufall, Marius Hamacher, Katharina Kloppenborg, and Torsten Zesch

work page

[57] [57]

In: Zong, C., Xia, F., Li, W., Navigli, R

A Legal Approach to Hate Speech – Operationalizing the EU’s Legal Framework against the Expression of Hatred as an NLP Task. InProceedings of the Natural Legal Language Processing Workshop 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 53–64. doi:10.18653/v1/ 2022.nllp-1.5

work page doi:10.18653/v1/ 2022