A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.
Mind your tone: Investigating how prompt politeness affects llm accuracy (short paper)
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Legal AI benchmarks must evaluate robustness to pro se litigant inputs rather than expert-preprocessed ones to support access-to-justice claims.
Toxic prompt perturbations reduce LLM factual accuracy on three benchmarks and selectively amplify perturbation-sensitive nodes in attribution graphs.
The GPT family has shifted from scaled text predictors to aligned multimodal tool-oriented systems, with persistent limitations like hallucination and prompt sensitivity remaining unchanged.
citing papers explorer
-
Qiskit Code Migration with LLMs
A taxonomy-guided RAG system with LLMs reduces hallucinations and improves migration suggestions for Qiskit code compared to unconstrained retrieval.
-
Legal Reasoning Is Not Lawyering: Rethinking Legal Benchmarks for Pro Se Access to Justice
Legal AI benchmarks must evaluate robustness to pro se litigant inputs rather than expert-preprocessed ones to support access-to-justice claims.
-
Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits
Toxic prompt perturbations reduce LLM factual accuracy on three benchmarks and selectively amplify perturbation-sensitive nodes in attribution graphs.
-
From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences
The GPT family has shifted from scaled text predictors to aligned multimodal tool-oriented systems, with persistent limitations like hallucination and prompt sensitivity remaining unchanged.