hub

Manning, Peter Henderson, and Daniel E

Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, M ichal Skreta, Christopher D · 2025 · arXiv 9025.371221

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Mean-based algorithms: A lower bound and regret

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.

Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

cs.HC · 2026-04-06 · unverdicted · novelty 7.0

Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.

Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

The paper identifies three pathologies of probabilistic RAG in legal retrieval (mereological blindness, diachronic blindness, causal opacity) and derives four deterministic architectural commitments to address the hierarchical, temporal, and institutional structure of legal knowledge.

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

HARVE removes the component of the reward-head vector aligned with a multi-directional hacking subspace from residual streams using a small set of contrastive examples, improving robustness on RewardHackBench across eight models without fine-tuning while preserving general capability.

Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.

Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces

cs.LG · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.

A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

cs.LG · 2026-06-18 · unverdicted · novelty 5.0

Introduces BMC, a manifold bandit framework that organizes problems into a hierarchical task tree and applies Bayesian learning to balance productivity, diversity, and utility in LLM curriculum sampling.

GradeLegal: Automated Grading for German Legal Cases

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.

Legal Retrieval for Public Defenders

cs.IR · 2026-01-20 · conditional · novelty 5.0

NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.

Legal Domain Adaptation of Modern BERT Models

cs.CL · 2026-06-26 · unverdicted · novelty 3.0

Further pre-training ModernBERT on US court opinions improves results on legal datasets compared to the base model, with gains similar to early BERT domain adaptation work.

citing papers explorer

Showing 12 of 12 citing papers after filters.

Mean-based algorithms: A lower bound and regret cs.LG · 2026-06-03 · unverdicted · none · ref 19
Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.
Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering cs.CL · 2026-05-22 · unverdicted · none · ref 20
LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.
Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality cs.HC · 2026-04-06 · unverdicted · none · ref 4
Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.
Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain cs.AI · 2026-06-08 · unverdicted · none · ref 20
The paper identifies three pathologies of probabilistic RAG in legal retrieval (mereological blindness, diachronic blindness, causal opacity) and derives four deterministic architectural commitments to address the hierarchical, temporal, and institutional structure of legal knowledge.
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models cs.LG · 2026-06-02 · unverdicted · none · ref 15
HARVE removes the component of the reward-head vector aligned with a multi-directional hacking subspace from residual streams using a small set of contrastive examples, improving robustness on RewardHackBench across eight models without fine-tuning while preserving general capability.
Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems cs.LG · 2026-05-11 · unverdicted · none · ref 27
The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.
Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces cs.LG · 2026-05-01 · unverdicted · none · ref 52 · 2 links
KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges cs.IR · 2026-04-30 · unverdicted · none · ref 94
A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models cs.LG · 2026-06-18 · unverdicted · none · ref 8
Introduces BMC, a manifold bandit framework that organizes problems into a hierarchical task tree and applies Bayesian learning to balance productivity, diversity, and utility in LLM curriculum sampling.
GradeLegal: Automated Grading for German Legal Cases cs.CL · 2026-05-20 · unverdicted · none · ref 72
Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.
Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization cs.CL · 2026-04-22 · unverdicted · none · ref 26
Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.
Legal Domain Adaptation of Modern BERT Models cs.CL · 2026-06-26 · unverdicted · none · ref 37
Further pre-training ModernBERT on US court opinions improves results on legal datasets compared to the base model, with gains similar to early BERT domain adaptation work.

Manning, Peter Henderson, and Daniel E

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer