8 Daniel Yang, Yao-Hung Hubert Tsai, and Makoto Yamada

Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, Michal Skreta, Christopher D · 2025 · arXiv 9025.371221

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

cs.HC · 2026-04-06 · unverdicted · novelty 7.0

Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.

Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.

Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.

GradeLegal: Automated Grading for German Legal Cases

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.

Legal Retrieval for Public Defenders

cs.IR · 2026-01-20 · conditional · novelty 5.0

NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.

citing papers explorer

Showing 8 of 8 citing papers.

Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering cs.CL · 2026-05-22 · unverdicted · none · ref 20
LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.
Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality cs.HC · 2026-04-06 · unverdicted · none · ref 4
Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.
Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems cs.LG · 2026-05-11 · unverdicted · none · ref 27
The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.
Kernel Affine Hull Machines for Compute-Efficient Query-Side Semantic Encoding cs.LG · 2026-05-01 · unverdicted · none · ref 52
Kernel Affine Hull Machines map lexical features to semantic embeddings via RKHS and least-mean-squares, outperforming adapters in reconstruction and retrieval metrics while reducing latency 8.5-fold on a legal benchmark.
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges cs.IR · 2026-04-30 · unverdicted · none · ref 94
A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.
GradeLegal: Automated Grading for German Legal Cases cs.CL · 2026-05-20 · unverdicted · none · ref 72
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization cs.CL · 2026-04-22 · unverdicted · none · ref 26
Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.
Legal Retrieval for Public Defenders cs.IR · 2026-01-20 · conditional · none · ref 44
NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.

8 Daniel Yang, Yao-Hung Hubert Tsai, and Makoto Yamada

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer