Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
citing papers explorer
-
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.
-
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.