pith. sign in

arxiv: 2606.18158 · v1 · pith:7H36QFJUnew · submitted 2026-06-16 · 💻 cs.CY · cs.AI· cs.CL

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

Pith reviewed 2026-06-26 22:13 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CL
keywords EU AI Actdoctrinal legal reasoninglegal AI benchmarkshigh-risk AIjudicial domainmeasurement gapautomation of lawAI regulation
0
0 comments X

The pith

No existing benchmark evaluates doctrinal legal reasoning, leaving the EU AI Act's accuracy requirement for high-risk judicial AI without operational content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that large language models can generate legal text of median quality but that existing benchmarks only test ancillary paralegal tasks instead of doctrinal legal reasoning. Doctrinal reasoning is presented as the interpretive core of legal work, involving the analysis of statutes, precedents, and principles to reach conclusions. The absence of a benchmark for this skill creates a methodological shortfall that also blocks compliance with the EU AI Act, which imposes a binding "appropriate accuracy" obligation on high-risk AI systems used in the judicial domain. Without a way to measure doctrinal performance, that regulatory standard cannot be given concrete meaning or enforced.

Core claim

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

What carries the argument

Doctrinal legal reasoning, defined as the interpretive core of legal work that applies statutes, precedents, and principles to reach case-specific conclusions, which existing benchmarks do not measure and which the EU AI Act's accuracy rule requires to become operational.

If this is right

  • High-risk AI systems deployed in EU judicial contexts cannot demonstrate compliance with the accuracy obligation without new evaluation methods.
  • Legal-AI development must prioritize tasks that test interpretive reasoning rather than document drafting or retrieval alone.
  • Regulators and deployers lack a concrete metric to assess whether an AI tool meets the EU AI Act standard in the judicial domain.
  • The regulatory requirement for appropriate accuracy remains unenforceable in practice until a doctrinal-reasoning benchmark exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI systems intended for legal use in the EU may need to incorporate explicit mechanisms for tracking and explaining doctrinal steps rather than relying solely on output quality.
  • Courts and legal-service providers could face delays in adopting AI tools until evaluation standards catch up with the Act's requirements.
  • The same measurement gap may appear in other jurisdictions that impose accuracy or reliability rules on AI in legal or adjudicative settings.

Load-bearing premise

Doctrinal legal reasoning is meaningfully distinct from tasks already measured by current legal-AI benchmarks and is the specific capability needed to give operational content to the EU AI Act's accuracy requirement.

What would settle it

Creation and application of a benchmark that isolates doctrinal legal reasoning and produces measurable scores showing whether current models satisfy or fail the EU AI Act's appropriate-accuracy threshold for judicial systems.

read the original abstract

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that large language models can produce legal text of median quality, but no existing benchmark evaluates whether they perform doctrinal legal reasoning—the interpretive core of legal work—rather than ancillary paralegal tasks measured by current legal-AI evaluations. It argues this methodological gap is also legal in nature: the EU AI Act imposes a binding 'appropriate accuracy' requirement for high-risk AI in the judicial domain, yet this requirement cannot acquire operational content without a doctrinal-reasoning benchmark.

Significance. If the claims hold, the paper would identify a timely intersection between AI benchmarking practices and the operational requirements of the EU AI Act for high-risk systems. It would underscore the need for evaluation methods that target core interpretive legal tasks, potentially guiding future benchmark development and informing how 'appropriate accuracy' is demonstrated in regulatory contexts. The argument draws on the Act's high-risk classification for judicial AI without relying on fitted parameters or derivations.

major comments (2)
  1. [Abstract] Abstract: The assertion that 'no existing benchmark can evaluate whether they perform doctrinal legal reasoning' and that current evaluations are limited to 'ancillary, paralegal tasks' is presented without a survey of existing legal-AI benchmarks, specific examples of their scope, or evidence distinguishing doctrinal reasoning from measured tasks. This is load-bearing for the central claim of a measurement gap.
  2. [Abstract] Abstract: The claim that the EU AI Act's 'appropriate accuracy' requirement 'cannot acquire operational content' without a doctrinal-reasoning benchmark lacks any textual analysis of the Act (e.g., Article 15, Annex III, or recitals) or discussion of why alternatives such as outcome validation or conformity assessment are insufficient. This premise is central to linking the methodological gap to a legal obligation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help strengthen the manuscript. We address the major comments below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'no existing benchmark can evaluate whether they perform doctrinal legal reasoning' and that current evaluations are limited to 'ancillary, paralegal tasks' is presented without a survey of existing legal-AI benchmarks, specific examples of their scope, or evidence distinguishing doctrinal reasoning from measured tasks. This is load-bearing for the central claim of a measurement gap.

    Authors: The referee is correct that the abstract presents this claim without supporting survey or examples. In the revised manuscript, we will expand the abstract to include a concise overview of prominent legal AI benchmarks (e.g., LegalBench, CUAD, and Case Law datasets) and explain how they primarily assess retrieval, classification, or outcome prediction rather than the interpretive doctrinal reasoning central to judicial decision-making. This will provide the necessary grounding for the measurement gap claim. revision: yes

  2. Referee: [Abstract] Abstract: The claim that the EU AI Act's 'appropriate accuracy' requirement 'cannot acquire operational content' without a doctrinal-reasoning benchmark lacks any textual analysis of the Act (e.g., Article 15, Annex III, or recitals) or discussion of why alternatives such as outcome validation or conformity assessment are insufficient. This premise is central to linking the methodological gap to a legal obligation.

    Authors: We agree that the abstract would be strengthened by explicit reference to the Act's text. The full paper analyzes the relevant provisions in detail, but to address the comment, we will revise the abstract to briefly cite Article 15(1) and Annex III point 8, noting that 'appropriate accuracy' must be demonstrated in the context of the AI system's intended purpose in judicial decision-making. We will also add a sentence explaining why post-hoc outcome validation is insufficient for high-risk systems under the Act's risk-based approach, as it does not address the reasoning process itself. This revision will be made without altering the core argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual gap identification relies on external legal text.

full rationale

The paper advances a conceptual argument identifying the absence of benchmarks for doctrinal legal reasoning and links this to the EU AI Act's 'appropriate accuracy' requirement for high-risk AI. No equations, derivations, fitted parameters, or self-citations appear in the provided text. The central claim does not reduce by construction to its own inputs; it references an independent external statute and existing evaluation methods without internal self-referential loops or load-bearing self-citations. This is a standard non-circular identification of a research gap.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only. The central claim rests on two domain assumptions about the nature of legal work and the implications of the EU AI Act; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Doctrinal legal reasoning forms the interpretive core of legal work and is distinct from ancillary paralegal tasks measured by existing benchmarks.
    Stated directly in the abstract as the foundation for claiming a measurement gap.
  • domain assumption The EU AI Act's 'appropriate accuracy' requirement for high-risk judicial AI cannot acquire operational content without a doctrinal-reasoning benchmark.
    Central legal premise linking the technical gap to regulatory enforceability.

pith-pipeline@v0.9.1-grok · 5613 in / 1314 out tokens · 40583 ms · 2026-06-26T22:13:41.417544+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 5 linked inside Pith

  1. [1]

    Defining and Describing What We Do: Doctrinal Legal Research

    9 Terry Hutchinson and Nigel Duncan, “Defining and Describing What We Do: Doctrinal Legal Research” (2012) 17 Deakin Law Review 83, 99 (“the doctrinal method is often so implicit and so tacit that many working within the legal paradigm consider that it is unnecessary to verbalise the process”.) 10 Richard Posner, ‘In Memoriam: Bernard D Meltzer (1914–2007...

  2. [2]

    12 Christopher McCrudden, ‘Legal Research and the Social Sciences’ (2006) 122 Law Quarterly Review 632,

    11 Ibid. 12 Christopher McCrudden, ‘Legal Research and the Social Sciences’ (2006) 122 Law Quarterly Review 632,

  3. [3]

    14 See further Martijn W Hesselink, 'A European Legal Method? On European Private Law and Scientific Method' (2009) 15 European Law Journal

    13 Ibid. 14 See further Martijn W Hesselink, 'A European Legal Method? On European Private Law and Scientific Method' (2009) 15 European Law Journal

  4. [4]

    15 Stefan Theil, ‘Carefully Tailored: Doctrinal Methods and Empirical Contributions’ (2025) 45 Oxford Journal of Legal Studies 1047,

  5. [5]

    16 Terry Hutchinson, ‘Vale Bunny Watson? Law Librarians, Law Libraries and Legal Research in the Post-Internet Era’ (2014) 106 Law Library Journal 579,

  6. [6]

    See also Martha Minow, ‘Archetypal Legal Scholarship: A Field Guide’ (2013) 63 Journal of Legal Education 65,

  7. [7]

    claim to correctness

    5 approaches.18 The doctrinal scholar speaks from inside the system, using its normative vocabulary as binding. According to Hart, the law’s normativity rests on a fundamental accepted norm, which he calls the rule of recognition.19 Indeed, doctrinal reasoning can describe positive law as a system of valid norms, and argue about what it requires, while pr...

  8. [8]

    21 Robert Alexy, Theorie der juristischen Argumentation (Suhrkamp 1991)

    234-38. 21 Robert Alexy, Theorie der juristischen Argumentation (Suhrkamp 1991). 22 Theil (n16) at

  9. [9]

    See further also Tarunabh Khaitan and Sandy Steel, ‘Theorizing Areas of Law: A Taxonomy of Special Jurisprudence’ (2023) 28 Legal Theory 325,

  10. [10]

    See also Rob van Gestel and Hans-Wolfgang Micklitz, ‘Why Methods Matter in European Legal Scholarship’ (2014) 20 European Law Journal 292,

    207, 218–19. See also Rob van Gestel and Hans-Wolfgang Micklitz, ‘Why Methods Matter in European Legal Scholarship’ (2014) 20 European Law Journal 292,

  11. [11]

    26 W B Gallie, ‘Essentially Contested Concepts’ (1956) 56 Proceedings of the Aristotelian Society

  12. [12]

    27 Ronald Dworkin, Taking Rights Seriously (Duckworth 1977); and Ronald Dworkin, ‘Is There Really No Right Answer in Hard Cases?’ in A Matter of Principle (Harvard University Press

  13. [13]

    29 Clemens Hufeld, 'Jede Korrektur eine andere Note: Quantitative Untersuchung der Objektivität juristischer Klausurbewertungen' (2024) 11 ZDRW

    ch 7; Neil MacCormick, Legal Reasoning and Legal Theory (Oxford University Press 1978); and Robert Alexy, A Theory of Legal Argumentation (Ruth Adler and Neil MacCormick trs, Oxford University Press 1989). 29 Clemens Hufeld, 'Jede Korrektur eine andere Note: Quantitative Untersuchung der Objektivität juristischer Klausurbewertungen' (2024) 11 ZDRW

  14. [14]

    Coherence in Legal Justification

    30 Neil MacCormick, “Coherence in Legal Justification” in A. Peczenik et al (eds) “Theory of Legal Science” (Springer 1984). 31 Ibid,

  15. [15]

    32 Robert Alexy and Aleksander Peczenik, ‘The Concept of Coherence and Its Significance for Discursive Rationality’ (1990) 3 Ratio Juris

  16. [16]

    34 See also van Gestel and Micklitz

    See further also Joseph Raz, ‘The Relevance of Coherence’ (1992) 72 Boston University Law Review 273 33 Mathias Siems and Daithí Mac Síthigh, 'Mapping Legal Research' (2012) 71 CLJ 651; Elizabeth Fisher, 'Imagining Method in Administrative Law Scholarship' in Carol Harlow (ed), A Research Agenda for Administrative Law (Edward Elgar 2023). 34 See also van ...

  17. [17]

    inside" from which the EU doctrinal interpreter reasons did not pre-exist the case law; it was constituted by it. Van Gend en Loos declared a

    36 Réviellère (n 36). 37 See, by way of example, Koen Lenaerts and José A Gutiérrez-Fons, Les méthodes d'interprétation de la Cour de justice de l'Union européenne (Bruylant 2020); Gunnar Beck, The Legal Reasoning of the Court of Justice of the EU (Hart 2012); Gerard Conway, The Limits of Legal Reasoning and the European Court of Justice (CUP 2012); Giuli...

  18. [18]

    39 Opinion 2/13 ECLI:EU:C:2014:2454; Case C-284/16 Achmea ECLI:EU:C:2018:158

    38 Case 26/62 Van Gend en Loos ECLI:EU:C:1963:1; Case 6/64 Costa v ENEL ECLI:EU:C:1964:66. 39 Opinion 2/13 ECLI:EU:C:2014:2454; Case C-284/16 Achmea ECLI:EU:C:2018:158. 40 Solange I, BVerfGE 37, 271 (1974); Solange II, BVerfGE 73, 339 (1986); BVerfG, Judgment of 5 May 2020 (PSPP), BVerfGE 154,

  19. [19]

    42 Case 327/82 Ekro ECLI:EU:C:1984:11, para

    Part I. 42 Case 327/82 Ekro ECLI:EU:C:1984:11, para

  20. [20]

    Case 283/81 CILFIT v Ministero della Sanità ECLI:EU:C:1982:335, para

  21. [21]

    ECLI:EU:C:1964:19, para 1 and Case 66/85 Lawrie-Blum

  22. [22]

    For the notion of the consumer, see Case C-464/01 Johann Gruber v Bay Wa AG

    ECLI:EU:C:1986:284, paras 12-16. For the notion of the consumer, see Case C-464/01 Johann Gruber v Bay Wa AG

  23. [23]

    ECLI:EU:C:2005:32, para

  24. [24]

    45 Jesse Dodge and others, ‘Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus’ (Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,

  25. [25]

    living political matrix

    and Chris Wendler, Veniamin Veselovsky, Giovanni Monea and Robert West, ‘Do Llamas Work in English? On the Latent Language of Multilingual Transformers’ (Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024). 46 Indeed, while teleological interpretation is crucial to understanding the development and current operat...

  26. [26]

    48 Joseph H H Weiler, 'The Transformation of Europe' (1991) 100 Yale Law Journal 2403,

    325, 327–29. 48 Joseph H H Weiler, 'The Transformation of Europe' (1991) 100 Yale Law Journal 2403,

  27. [27]

    49 Maarten den Heijer, Teun van Os van den Abeelen and Antanina Maslyka, 'On the Use and Misuse of Recitals in European Union Law' (Amsterdam Law School Research Paper No 2019-31,

  28. [28]

    https://ssrn.com/abstract=3445372 accessed 12 June 2026 and Tadas Klimas and Jūratė Vaičiukaitė, 'The Law of Recitals in European Community Legislation' (14 July

  29. [29]

    For an interpretation, see further Imelda Maher, 'The CILFIT Criteria Clarified and Extended for National Courts of Last Resort Under Art 267 TFEU' (2022) 7 European Papers

    University of Illinois Journal of Law, Technology & Policy 235 51 See Case 283/81 CILFIT v Ministero della Sanità ECLI:EU:C:1982:335, para 18 and C-561/19 Consorzio Italian Management e Catania Multiservizi and Catania Multiservizi ECLI:EU:C:2021:799, para 41-50. For an interpretation, see further Imelda Maher, 'The CILFIT Criteria Clarified and Extended ...

  30. [30]

    53 CILFIT (n

    52 C-561/19 Consorzio Italian Management e Catania Multiservizi and Catania Multiservizi ECLI:EU:C:2021:799, para 47 (“it is only where, with the help of the interpretation criteria set out in paragraphs 40 to 46 above, a national court or tribunal of last instance concludes that there is no circumstance capable of giving rise to any reasonable doubt as t...

  31. [31]

    common task framework

    para 20 9 Indeed, EU law is characterized by its layered constitutional structure. This comprises the Treaties but also the Charter of Fundamental Rights has the same legal value as the Treaties.54 In addition, general principles of EU law, which are derived from the common constitutional traditions of the Member States and the ECHR operate as an uncodifi...

  32. [32]

    5185; Matthew Dahl and others, 'Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models' (2024) 16 Journal of Legal Analysis 64; Brandon Waldon and others, 'Large Language Models for Legal Interpretation? Don't Take Their Word for It' (2025) 114 Georgetown Law Journal

  33. [33]

    paras 174–178. 59 For an introduction to benchmarks, see Moritz Hardt, '1 – Introduction' in The Emerging Science of Machine Learning Benchmarks (2025) <https://mlbenchmarks.org/01-introduction.html> accessed 12 June

  34. [34]

    61 David Donoho, '50 Years of Data Science' (2017) 26 Journal of Computational and Graphical Statistics

    arXiv:2111.15366; Percy Liang and others, 'Holistic Evaluation of Language Models' (2022) arXiv:2211.09110. 61 David Donoho, '50 Years of Data Science' (2017) 26 Journal of Computational and Graphical Statistics

  35. [35]

    10 is presented to the community in the form of its scores on the relevant benchmarks. Benchmarks are thus how the ML community settles capability disputes that intuition cannot resolve and, for all their documented shortcomings, what would be reached for to assess a model promising doctrinal competence. A deeper analysis of the shortcoming of benchmarks ...

  36. [36]

    and Samuel Bowman and George Dahl, 'What Will it Take to Fix Benchmarking in Natural Language Understanding?' in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics

  37. [37]

    https://arxiv.org/abs/2603.05392 accessed 15 June

  38. [38]

    65 Zijun Yao and others, 'Are Reasoning Models More Prone to Hallucination?' (2025) arXiv:2505.23646

    64 Jared Kaplan and others, 'Scaling Laws for Neural Language Models' (2020) arXiv:2001.08361 and Jordan Hoffmann and others, 'Training Compute-Optimal Large Language Models' (2022) arXiv:2203.15556. 65 Zijun Yao and others, 'Are Reasoning Models More Prone to Hallucination?' (2025) arXiv:2505.23646. 11 fluent surface that masks absent reasoning.66 A mode...

  39. [39]

    67 Varun Magesh and others, 'Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools' (2025) 22(2) Journal of Empirical Legal Studies

  40. [40]

    benchmarks do not only measure, but also steer progress in AI

    68 Simon Ott and others, 'Mapping Global Dynamics of Benchmark Creation and Saturation in Artificial Intelligence' (2022) 13 Nature Communications 6793, 1 (“benchmarks do not only measure, but also steer progress in AI”). 69 David Manheim and Scott Garrabrant, 'Categorizing Variants of Goodhart's Law' (2018) arXiv:1803.04585. 70 Whether this is desirable,...

  41. [41]

    including by not materially influencing the outcome of decision making

    The Derogation for Non-High-Risk-Systems in Article 6(3) as an Indirect Incentive for Doctrinal Legal Reasoning Benchmarks The AI Act governs different kinds of AI systems based on their perceived levels of risk. This includes its regime for HRAIS, which applies to AI systems that meet the classification criteria outlined in Articles 6(1) and (2).74 Artic...

  42. [42]

    75 Point 8(1) of Annex III AIA

    74 See further Chapter III AIA and related commentary in Michèle Finck, “The EU Artificial Intelligence Act: A Commentary (Oxford University Press 2026), Section 4.01 ff. 75 Point 8(1) of Annex III AIA. 76 Recital 61 AIA. 13 decision-making patterns or deviations without replacing or influencing the previously completed assessment, subject to proper human...

  43. [43]

    in cooperation with relevant stakeholders and organisations such as metrology and benchmarking authorities,

    Article 15 AIA and the Accuracy Requirement for High-Risk AI Systems If, by contrast, an AI system used in the judicial domain meets the criteria in Point 8(a) of Annex III and the derogation in Article 6(3) is unavailable, its provider must comply with all essential requirements applicable to HRAIS.83 This includes the accuracy, robustness, and cybersecu...

  44. [44]

    arXiv:2404.00596

    (ELRA and ICCL 2024). arXiv:2404.00596. 92 Ilias Chalkidis and others, ‘LexGLUE: A Benchmark Dataset for Legal Language Understanding in English’ (Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics,

  45. [45]

    focus on legal reasoning questions with objectively correct answers

    93 Neel Guha and others, ‘LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models’ (2023) arXiv:2308.11462. 16 rule-application; rule-conclusion; interpretation, and rhetorical-understanding.94 LegalBench cannot be said to test doctrinal reasoning. Its tasks are atomic by design, and its authors explicitly note...

  46. [46]

    100 Odysseas S Chlapanis and others, 'GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations' (2025) arXiv:2505.17267

    99 Ibid. 100 Odysseas S Chlapanis and others, 'GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations' (2025) arXiv:2505.17267. 101 Yu Fan and others, 'LEXam: Benchmarking Legal Reasoning on 340 Law Exams' (2025) arXiv:2505.12864,

  47. [47]

    valuation as a distribution over plausible expert assessments rather than a single deterministic label

    17 legal system as a whole. Second, because it scores answers against reference answers, it cannot test contestability, where the mark of competence is a defensible reading rather than the single right one. 104 To this may be added that the benchmark is built predominantly on Swiss law, so that it bears on EU law only indirectly. BenGER (2026) is a benchm...

  48. [48]

    105 Sebastian Nagl and others, ‘BenGER: Benchmarking LLM Systems on Subsumption-Based Legal Reasoning in German Law’ (2026) arXiv:2605.28183,

  49. [49]

    https://arxiv.org/abs/2601.02993 accessed 15 June 2026; Cheng Niu and others ‘RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models’ in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (2024) 10862–10878, Association for Computational Linguistics; Yujia Zhou and others, 'Trus...

  50. [50]

    https://arxiv.org/abs/2409.10102 accessed 15 June

  51. [51]

    113 Magesh and others (n xx). 114 Markus Reuter and others, 'Towards Reliable Retrieval in RAG Systems for Large Legal Datasets' in Nikolaos Aletras and others (eds), Proceedings of the Natural Legal Language Processing Workshop 2025 (Association for Computational Linguistics

  52. [52]

    115 Ilias Chalkidis, Manos Fergadiotis and Ion Androutsopoulos, 'MultiEURLEX — A Multi-lingual and Multi-label Legal Document Classification Dataset for Zero-shot Cross-lingual Transfer' in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 19 A. Failures of Source Recognition and Authority The failures grouped below c...

  53. [53]

    Soft law matters in EU law, yet its effect diverges widely depending on the specific are of law in question

    Treating soft law as a unitary whole. Soft law matters in EU law, yet its effect diverges widely depending on the specific are of law in question. Soft law has been defined as “rules of conduct that are laid down in instruments which have not been attributed legally binding force as such, but nevertheless may have certain (indirect) legal effects, and tha...

  54. [54]

    ECLI:EU:C:2005:709, para

  55. [55]

    ECLI:EU:C:2010:21 and Mirjam de Mol, 'Kücükdeveci: Mangold Revisited – Horizontal Direct Effect of a General Principle of EU Law' (2010) 6 European Constitutional Law Review

  56. [56]

    118 Article 1 of Council Directive 2000/78/EC of 27 November 2000 establishing a general framework for equal treatment in employment and occupation

  57. [57]

    OJ L303/16. 119 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

  58. [58]

    OJ L119/1. 120 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)

  59. [59]

    Errors related to the effects of a doctrine. These errors entail missing or confusing the doctrines of direct effect, direct applicability, consistent interpretation and state liability, or misstating against whom a provision can be invoked. An of this error class would be misjudging when a provision of EU law is capable of horizontal direct effect – one ...

  60. [60]

    ECR 4407 as well as Emilia Korkea-aho, 'National Courts and European Soft Law: Is Grimaldi Still Good Law?' (2018) 37 Yearbook of European Law

  61. [61]

    123 Zlatina Georgieva, 'Soft Law in EU Competition Law and its Judicial Reception in Member States: A Theoretical Perspective' (2015) 16 German Law Journal

  62. [62]

    The Law of Everything. Broad Concept of Personal Data and Future of EU Data Protection Law

    124 Nadezhda Purtova “The Law of Everything. Broad Concept of Personal Data and Future of EU Data Protection Law” (2018) Law, Innovation and Technology

  63. [63]

    as to the result to be achieved, upon each Member State to which it is addressed, but shall leave to the national authorities the choice of form and methods

    Blindness towards national implementation. EU law knows two distinct sources of secondary legislation: regulations and directives.131 Regulations have direct application, are binding in their entirely and directly applicable in all Member States.132 In contrast, directives are merely binding “as to the result to be achieved, upon each Member State to whic...

  64. [64]

    The Erga Omnes Effects of the Interpretative Preliminary Rulings' (2023) 15 European Journal of Legal Studies

    See further Giuseppe Martinico, 'Retracing Old (Scholarly) Paths. The Erga Omnes Effects of the Interpretative Preliminary Rulings' (2023) 15 European Journal of Legal Studies

  65. [65]

    129 Article 51(1) CFR and Case C-617/10 Åklagaren v Hans Åkerberg Fransson EU:C:2013:105

    128 See also Article 260(1) TFEU. 129 Article 51(1) CFR and Case C-617/10 Åklagaren v Hans Åkerberg Fransson EU:C:2013:105. 130 Case C-268/15 Ullens de Schooten EU:C:2016:874 and Sara Iglesias Sánchez, 'Purely Internal Situations and the Limits of EU Law: A Consolidated Case Law or a Notion to be Abandoned?' (2018) 14 European Constitutional Law Review

  66. [66]

    Failure to consider multilingualism. It was seen above that EU law is equally authentic in twenty-four language versions and no single version may serve as the sole basis of interpretation 134 See further Section III above. 135 Joined Cases C-267/91 and C-268/91 Keck and Mithouard EU:C:1993:905. 136 Case C-136/04 Deutsches Milch-Kontor GmbH v Hauptzollamt...

  67. [67]

    and Article 296(2) TFEU; Joint Practical Guide of the European Parliament, the Council and the Commission for Persons Involved in the Drafting of European Union Legislation (2nd edn, Publications Office of the European Union 2015). The legal status of recitals will be particularly challenging in respect of the AI Act considering the frequent discrepancies...

  68. [68]

    137 Case C-634/21 OQ v Land Hessen (SCHUFA Holding) EU:C:2023:957, paras 45 and

  69. [69]

    ECR 3415, paras 16–21. 140 Katherine Tian and others, 'Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback' (2023) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 5433, arXiv:2305.14975 and Miao Xiong and others, 'Can LLMs Express Their U...

  70. [70]

    Mismatching citation and proposition One of the more subtle mistakes in human doctrinal reasoning is a real citation offered for a proposition it does not in fact support. This is also the failure mode into which retrieval-augmentation converts fabrication, and it is more dangerous than hallucinations: the citation is real, the source is real, and only a ...