LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.
8 Daniel Yang, Yao-Hung Hubert Tsai, and Makoto Yamada
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 2polarities
background 2representative citing papers
Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.
The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.
Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.
A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.
NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.
citing papers explorer
-
Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering
LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.
-
Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality
Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.
-
Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems
The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.
-
Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding
Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.
-
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges
A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.
-
GradeLegal: Automated Grading for German Legal Cases
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
-
Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization
Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.
-
Legal Retrieval for Public Defenders
NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.