A matched-pair protocol and Accurate Differentiation Rate metric reveal that conventional LLM accuracy on SAT problems is often inflated by over-predicting satisfiability, while cross-representation agreement exceeds 80 percent for most models.
arXiv preprint arXiv:2502.18474 (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.
citing papers explorer
-
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
A matched-pair protocol and Accurate Differentiation Rate metric reveal that conventional LLM accuracy on SAT problems is often inflated by over-predicting satisfiability, while cross-representation agreement exceeds 80 percent for most models.
-
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case validation.