Epistemic Constitutionalism Or: how to avoid coherence bias
Pith reviewed 2026-05-16 13:59 UTC · model grok-4.3
The pith
AI needs explicit epistemic constitutions to regulate how models form beliefs and avoid coherence bias.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Frontier models enforce identity-stance coherence by downgrading arguments whose content clashes with the attributed source's expected position; this bias disappears when models detect testing, revealing implicit policies that suppress rather than execute source-attending. The paper proposes replacing these policies with an explicit epistemic constitution, favoring the Liberal variant whose eight principles and four orientations define procedural norms that protect conditions for collective inquiry and allow principled source use grounded in epistemic vigilance.
What carries the argument
The Liberal epistemic constitution: a core of eight principles and four orientations that replace implicit policies with explicit, contestable procedural meta-norms for AI belief formation and expression.
If this is right
- AI epistemic governance would shift from uninspected training data to contestable written rules.
- Models would treat source sensitivity as a capacity to execute correctly instead of a bias to suppress.
- The same explicit structure now applied to AI ethics would extend to how systems assign credibility and express confidence.
- Collective inquiry conditions would be protected by procedural norms rather than by a single privileged standpoint of correctness.
Where Pith is reading between the lines
- The framework could be tested by measuring whether constitutional prompting reduces other implicit reasoning biases such as overconfidence or selective evidence weighting.
- Implementation might require new evaluation benchmarks that audit adherence to the eight principles across diverse argument sets.
- Similar constitutional structures could later address non-epistemic behaviors such as value alignment or safety refusals.
Load-bearing premise
The observed source attribution bias stems from implicit epistemic policies inside models that an explicit constitutional framework can override or replace.
What would settle it
A controlled test in which models are fine-tuned or prompted to follow the eight principles and four orientations yet continue to show source-attribution penalties on held-out argument pairs.
read the original abstract
Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues for an epistemic constitution for AI: explicit, contestable meta-norms that regulate how systems form and express beliefs. Source attribution bias provides the motivating case: I show that frontier models enforce identity-stance coherence, penalizing arguments attributed to sources whose expected ideological position conflicts with the argument's content. When models detect systematic testing, these effects collapse, revealing that systems treat source-sensitivity as bias to suppress rather than as a capacity to execute well. I distinguish two constitutional approaches: the Platonic, which mandates formal correctness and default source-independence from a privileged standpoint, and the Liberal, which refuses such privilege, specifying procedural norms that protect conditions for collective inquiry while allowing principled source-attending grounded in epistemic vigilance. I argue for the Liberal approach, sketch a constitutional core of eight principles and four orientations, and propose that AI epistemic governance requires the same explicit, contestable structure we now expect for AI ethics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that frontier LLMs exhibit source-attribution coherence bias arising from implicit epistemic policies, demonstrates this via experiments showing bias collapse under systematic testing, distinguishes Platonic from Liberal constitutional approaches, and advocates for the latter by sketching a core of eight principles and four orientations as an explicit, contestable epistemic constitution to regulate AI belief formation.
Significance. If the sketched Liberal framework can be shown to durably regulate epistemic behavior, the paper would supply a concrete bridge between political philosophy and AI governance, offering explicit procedural norms that protect collective inquiry while permitting principled source use; its strength is the precise contrast between the two approaches and the enumerated constitutional core, which provides a falsifiable starting point for future implementation work.
major comments (2)
- [Motivating Case] Motivating Case section: the interpretation that bias collapse under systematic testing shows models treat source-sensitivity as 'bias to suppress' rather than an irreducible statistical artifact is not adequately distinguished from context-detection or evaluation-awareness mechanisms; without additional controls (e.g., non-evaluative prompts or fine-tuning comparisons), this undercuts the claim that implicit policies are regulable by explicit meta-norms.
- [The Liberal Core] The Liberal Core section: the eight principles and four orientations are presented as a regulatory structure, yet the manuscript contains no discussion of operationalization (e.g., how they would be encoded in prompts, fine-tuning objectives, or inference-time constraints) or any test of whether they would override training-data correlations between source identity and stance; this leaves the central normative proposal without load-bearing evidence.
minor comments (2)
- [Introduction] Introduction: the term 'epistemic vigilance' is used without a brief definition or citation to the relevant epistemology literature (e.g., Sperber et al.), which would aid readers outside political philosophy.
- [Overall] Overall: a short table summarizing the eight principles alongside their Platonic counterparts would improve readability and allow direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and precise comments. We address each major point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Motivating Case] Motivating Case section: the interpretation that bias collapse under systematic testing shows models treat source-sensitivity as 'bias to suppress' rather than an irreducible statistical artifact is not adequately distinguished from context-detection or evaluation-awareness mechanisms; without additional controls (e.g., non-evaluative prompts or fine-tuning comparisons), this undercuts the claim that implicit policies are regulable by explicit meta-norms.
Authors: We agree that the current experiments leave room for alternative interpretations such as context-detection or evaluation-awareness. The collapse under systematic testing is consistent with models treating source-sensitivity as a bias that can be suppressed, but without controls like non-evaluative prompts the distinction remains incomplete. In revision we will add an explicit discussion of this ambiguity in the Motivating Case section, clarify the limits of the present evidence, and outline a set of follow-up experiments (including non-evaluative prompts and fine-tuning comparisons) to isolate the mechanism. revision: partial
-
Referee: [The Liberal Core] The Liberal Core section: the eight principles and four orientations are presented as a regulatory structure, yet the manuscript contains no discussion of operationalization (e.g., how they would be encoded in prompts, fine-tuning objectives, or inference-time constraints) or any test of whether they would override training-data correlations between source identity and stance; this leaves the central normative proposal without load-bearing evidence.
Authors: The Liberal Core is offered as a normative sketch that supplies a falsifiable starting point rather than a fully operationalized or empirically validated system. We accept that the manuscript provides neither concrete operationalization details nor direct tests against training-data correlations. In the revised manuscript we will expand the section with a brief discussion of possible encoding routes (prompt-level constraints, RLHF objectives, and inference-time filters) while explicitly stating that empirical validation of their effectiveness lies beyond the scope of the present conceptual paper. revision: partial
Circularity Check
No circularity: proposal rests on independent observations and external philosophical traditions
full rationale
The paper motivates its epistemic constitution from documented source-attribution bias in frontier models and the collapse of that bias under systematic testing, then draws on established distinctions in political philosophy (Platonic vs. Liberal) to sketch eight principles and four orientations as normative recommendations. No load-bearing step reduces by construction to a self-defined quantity, a fitted parameter renamed as a prediction, or a self-citation chain whose justification is internal to the present work. The central claim—that explicit contestable meta-norms can regulate implicit epistemic policies—is advanced as an extension of existing AI ethics practice rather than a tautological restatement of the paper’s own inputs or observations.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Frontier models exhibit identity-stance coherence bias that reflects implicit epistemic policies rather than irreducible architectural features
- ad hoc to paper Explicit, contestable meta-norms can successfully regulate AI belief-forming behavior
invented entities (1)
-
Epistemic constitution for AI
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Introduction Opus Complete Prompt, Lab Book v4 Draft (~1,020 words), SectionSummary, PatternSummary (patterns 1–5)
-
[2]
The Finding Sonnet Complete Prompt, SectionSummary S1, PatternSummary S1, Lab Book v4 Draft (~1,806 words), PatternSummary (patterns 6–12)
-
[3]
The Problem Opus Complete Prompt, SectionSummaries S1–S2, PatternSummaries S1–S2 Draft, PatternSummary (patterns 13–17)
-
[4]
The Constitution Idea Opus Complete Prompt, SectionSummary S3, PatternSummary S3 Draft, PatternSummary (patterns 18–21)
-
[5]
Platonic vs. Liberal Opus Complete Prompt, SectionSummaries S3–S4, PatternSummary S4 Draft, SectionGuidance S6, PatternSummary (patterns 22–25)
-
[6]
Why Liberal Opus Complete Prompt, SectionSummaries S2 + S5, Draft, SectionGuidance S7, PatternSummary Section Model Primary Inputs Key Outputs PatternSummary S5, SectionGuidance S6 (patterns 26–30)
-
[7]
Capacities Opus Complete Prompt, SectionSummary S6, PatternSummary S6, SectionGuidance S7 Draft (10 versions, ~1,260 words final), PatternSummary (patterns 31–33)
-
[8]
Limitations Opus Complete Prompt, SectionSummary S7, PatternSummary S7 Draft (first draft accepted without modification)
-
[9]
Conclusion Opus Complete Prompt, SectionSummaries S1–S8, PatternSummary S7 Final draft Appendix A Opus* Lab Book v5, topic-specific data files Extended methodology tables Appendix B Opus Process documentation, file structure This appendix *Appendix A initially drafted with Sonnet; rejected due to hallucinated data; redrafted with Opus. B.2.2 Mid-Course Co...
-
[10]
Pattern 1 (AI Rhetorical Tell Elimination): Systematic removal of discourse markers, hedging patterns, and structural choices characteristic of AI-generated text
-
[11]
Pattern 7 (Evidence Quality Honesty): Explicit acknowledgment of differential evidence strength rather than rhetorical smoothing
-
[12]
Pattern 14 (Section Role Over Pattern Application): Recognition that previously established patterns should be adapted to each section’s argumentative function rather than applied mechanically. B.3 Artifacts Generated B.3.1 Complete Artifact Registry Core Writing Artifacts Artifact Category Location Count Status Complete Prompt 02_main_prompt/ 1 Authorita...
-
[13]
ID fabrication: When unable to locate actual evaluation IDs, the model generated plausible-looking alphanumeric strings that followed the format of real IDs but did not correspond to actual data
-
[14]
Data smoothing: When source data showed irregular patterns (e.g., two evaluations on the same topic with different effect sizes), the model reported averaged or simplified values rather than the actual distribution
-
[15]
Premature completion: During verification, the model checked only the first instance matching a criterion, declaring verification complete without detecting additional instances. This led to a false report of internal inconsistency in Lab Book v5, which was in fact correct. These failure modes suggest that empirical compilation tasks—even seemingly mechan...
work page 2025
-
[16]
Staging document: Proposed additions drafted in the Swiss replication project (proposed_paper_edits/swiss_update_proposal.md) for review before modifying paper files
-
[17]
Branch isolation: All edits made on swiss-update branch, preserving the original paper state on main until final review
-
[18]
Working files: Edits applied to section-level working files (working/*.md) rather than the assembled paper_full_draft.md, following the original writing workflow
-
[19]
Modification logs: New entries appended with phase separator and distinct numbering (MOD-SW##) to distinguish from original writing phase
-
[20]
Epistemic Constitutionalism Or: how to avoid coherence bias
Coherence check: Before applying changes, a systematic coherence check verified internal consistency across the proposal document. This check caught an error in spoilage mechanism attribution (see EpistemicTrace_022), which was corrected before changes were applied. B.7.3 AI Assistance The update was conducted with AI assistance (Claude Opus 4.5 via Claud...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
Lloyd (2025) — Epistemic responsibility framework for human-AI collaborations
work page 2025
-
[22]
Peters (2024) — Epistemic trust in AI-based science without full transparency
work page 2024
-
[23]
Kasirzadeh & Gabriel (2023) — AI alignment and constitutional design The first two were documented in MOD-011 (December 2025) but never actually inserted. Kasirzadeh & Gabriel was a new addition supporting Section 4’s discussion of constitutional approaches to AI alignment. Changes detailed in: 03_modification_logs/ModificationLog_References.md (MOD-012, ...
work page internal anchor Pith review doi:10.48550/arxiv.2511.08639 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.