arXiv.https://arxiv.org/abs/ 2510.11822

Suryaansh Jain, Umair Z · 2025 · arXiv 2510.11822

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.

Catching One in Five: LLM-as-Judge Blind Spots in Production Multi-Turn Transaction Agents

cs.CL · 2026-06-09 · accept · novelty 6.0

Empirical study of a production multi-turn ordering agent finds LLM-as-judge recall below 25% for human-confirmed defects, missing cross-turn state issues due to limited rubric and routing.

A Finite-Calibration Regime Map for LLM Judge Panels

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

The paper introduces a finite-calibration regime map and Finite-Calibration Panel Selection selector, finding scalar aggregation wins on most real benchmark-budget combinations while joint tables help when interactions are present.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery cs.CR · 2026-04-21 · unverdicted · none · ref 50
Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.
Catching One in Five: LLM-as-Judge Blind Spots in Production Multi-Turn Transaction Agents cs.CL · 2026-06-09 · accept · none · ref 6
Empirical study of a production multi-turn ordering agent finds LLM-as-judge recall below 25% for human-confirmed defects, missing cross-turn state issues due to limited rubric and routing.
A Finite-Calibration Regime Map for LLM Judge Panels cs.CL · 2026-05-31 · unverdicted · none · ref 10
The paper introduces a finite-calibration regime map and Finite-Calibration Panel Selection selector, finding scalar aggregation wins on most real benchmark-budget combinations while joint tables help when interactions are present.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 63
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

arXiv.https://arxiv.org/abs/ 2510.11822

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer