{"total":13,"items":[{"citing_arxiv_id":"2606.28538","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Legal Domain Adaptation of Modern BERT Models","primary_cat":"cs.CL","submitted_at":"2026-06-26T18:44:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Further pre-training ModernBERT on US court opinions improves results on legal datasets compared to the base model, with gains similar to early BERT domain adaptation work.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19750","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-06-18T03:31:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces BMC, a manifold bandit framework that organizes problems into a hierarchical task tree and applies Bayesian learning to balance productivity, diversity, and utility in LLM curriculum sampling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09724","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain","primary_cat":"cs.AI","submitted_at":"2026-06-08T16:46:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper identifies three pathologies of probabilistic RAG in legal retrieval (mereological blindness, diachronic blindness, causal opacity) and derives four deterministic architectural commitments to address the hierarchical, temporal, and institutional structure of legal knowledge.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04931","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mean-based algorithms: A lower bound and regret","primary_cat":"cs.LG","submitted_at":"2026-06-03T14:23:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Derives first lower bound on γ_t for mean-based algorithms in unknown-horizon bandit settings, proposes two new algorithms, and shows some are also no-regret.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03131","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models","primary_cat":"cs.LG","submitted_at":"2026-06-02T04:18:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HARVE removes the component of the reward-head vector aligned with a multi-directional hacking subspace from residual streams using a small set of contrastive examples, improving robustness on RewardHackBench across eight models without fine-tuning while preserving general capability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23497","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering","primary_cat":"cs.CL","submitted_at":"2026-05-22T11:02:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs show severe staleness after training cutoffs and recency bias on historical German statutes; RAG with version filtering mitigates both better than web search.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21076","ref_index":72,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GradeLegal: Automated Grading for German Legal Cases","primary_cat":"cs.CL","submitted_at":"2026-05-20T12:09:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reasoning-oriented LLMs reach up to 0.91 quadratic weighted kappa agreement with experts on public law cases when given sample solutions and grading rubrics, but only 0.60 on criminal law cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10604","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems","primary_cat":"cs.LG","submitted_at":"2026-05-11T14:04:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The Pareto frontier of fair algorithmic decisions consists of deterministic group-specific threshold rules on predicted success probabilities, which can include upper bounds for some fairness metrics and holds independently of model training approach.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Fairness in credit scoring: Assessment, implementation and profit implications.European Journal of Operational Research297, 3 (2022), 1083-1094. doi:10.1016/j.ejor.2021.06.023 [26] Francesca Lagioia, Riccardo Rovatti, and Giovanni Sartor. 2023. Algorithmic fairness through group parities? The case of COMPAS- SAPMOC.AI & society38, 2 (2023), 459-478. doi:10.1007/s00146-022-01441-y [27] Benjamin Laufer, Manish Raghavan, and Solon Barocas. 2025. What Constitutes a Less Discriminatory Algorithm?. InProceedings of the Symposium on Computer Science and Law on ZZZ (CSLA W '25). ACM, 136-151. doi:10.1145/3709025.3712214 [28] Suyun Liu and Luis Nunes Vicente. 2022. Accuracy and fairness trade-offs in machine learning: A stochastic multi-objective approach."},{"citing_arxiv_id":"2605.02950","ref_index":52,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces","primary_cat":"cs.LG","submitted_at":"2026-05-01T17:46:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"KAHM yields a compute-efficient query encoder that outperforms matched learned adapters in reconstructing a frozen Mixedbread embedding space on an Austrian-law retrieval task while delivering an 8.53x CPU speedup.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Therefore, ˜hx↦→ yc∈M c, and Theorem 1 yields, with probability at least 1− δ, E x∼ Px [⏐ ⏐ ⏐˜hx↦→ yc(x)− Py|x(yc = 1|x) ⏐ ⏐ ⏐ 2] ≤ min ( 1 N N∑ i=1 |yi c− ˜hx↦→ yc(xi)|2 + 4√ N + √ log(1/δ ) 2N , 1 (Nc/N )2 ( 3√ N + √ 8 log(1/δ ) N )) . (51) By the triangular inequality in L2(Rn, Px), we have ∥Φ c− Py|x(yc = 1|·)∥L2(Rn, Px)≤∥ Φ c− ˜hx↦→ yc∥L2(Rn, Px) +∥˜hx↦→ yc− Py|x(yc = 1|·)∥L2(Rn, Px). (52) Therefore, using (a + b)2≤ (1 + η)a2 + (1 + η− 1)b2, it follows that ∥Φ c− Py|x(yc = 1|·)∥2 L2(Rn, Px) ≤ (1 + η)∥Φ c− ˜hx↦→ yc∥2 L2(Rn, Px) + (1 + η− 1)∥˜hx↦→ yc− Py|x(yc = 1|·)∥2 L2(Rn, Px). (53) Using (49) and (51) in (53), the result is obtained. Crucially, Theorem 2 shows that the ﬁnite-sample contribut ion to the approximation error of the KAHM-induced"},{"citing_arxiv_id":"2605.00063","ref_index":94,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey of Reasoning-Intensive Retrieval: Progress and Challenges","primary_cat":"cs.IR","submitted_at":"2026-04-30T08:35:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20726","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization","primary_cat":"cs.CL","submitted_at":"2026-04-22T16:12:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Automatic prompt optimization using lenient LLM judges improves performance and transferability in legal QA evaluations compared to human design or strict judges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04418","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality","primary_cat":"cs.HC","submitted_at":"2026-04-06T04:53:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Error verifiability is a distinct dimension of LLM quality separate from accuracy that requires targeted, domain-aware interventions like reflect-and-rephrase and oracle-rephrase to improve.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.14348","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Legal Retrieval for Public Defenders","primary_cat":"cs.IR","submitted_at":"2026-01-20T17:08:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}