URL https://www.nature.com/articles/ s41586-024-07930-y

Zhou, L · 2024 · DOI 10.1038/s41586-024-07930-y

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill

cs.SE · 2026-06-04 · conditional · novelty 7.0

Controlled ablation finds Popperian code-generation skill adds no separable correctness benefit over labels-only scaffold; gains track structure not content.

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.

Consistency Training while Mitigating Obfuscation via Rate Matching

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience

cs.HC · 2026-05-18 · unverdicted · novelty 5.0

GPT produces click distributions significantly different from real humans in 53% of UX first-click tasks, with prompting techniques like personas and chain-of-thought failing to improve alignment.

Predicting Performance of Symbolic and Prompt Programs with Examples

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Proposes RAP, a retrieval-based approximate prior method, to predict performance of symbolic programs and LLM prompts on new tasks using a Bernoulli model and corpus-derived performance distributions.

A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

The paper introduces a reproducible optimization protocol for prompt-based LLM workflows in evidence synthesis that separates task definitions from prompt harnesses, optimizes the harness against metrics and examples, and preserves the result as an inspectable artefact.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill cs.SE · 2026-06-04 · conditional · none · ref 51
Controlled ablation finds Popperian code-generation skill adds no separable correctness benefit over labels-only scaffold; gains track structure not content.
POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems cs.AI · 2026-06-01 · unverdicted · none · ref 8
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
Consistency Training while Mitigating Obfuscation via Rate Matching cs.CL · 2026-06-01 · unverdicted · none · ref 62
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.
What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience cs.HC · 2026-05-18 · unverdicted · none · ref 31
GPT produces click distributions significantly different from real humans in 53% of UX first-click tasks, with prompting techniques like personas and chain-of-thought failing to improve alignment.
Predicting Performance of Symbolic and Prompt Programs with Examples cs.LG · 2026-05-15 · unverdicted · none · ref 11
Proposes RAP, a retrieval-based approximate prior method, to predict performance of symbolic programs and LLM prompts on new tasks using a Bernoulli model and corpus-derived performance distributions.
A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis cs.LG · 2026-05-07 · unverdicted · none · ref 11
The paper introduces a reproducible optimization protocol for prompt-based LLM workflows in evidence synthesis that separates task definitions from prompt harnesses, optimizes the harness against metrics and examples, and preserves the result as an inspectable artefact.

URL https://www.nature.com/articles/ s41586-024-07930-y

fields

years

verdicts

representative citing papers

citing papers explorer