Epistemic Constitutionalism Or: how to avoid coherence bias

Michele Loi

arxiv: 2601.14295 · v3 · submitted 2026-01-16 · 💻 cs.AI · cs.CL· cs.CY

Epistemic Constitutionalism Or: how to avoid coherence bias

Michele Loi This is my paper

Pith reviewed 2026-05-16 13:59 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CY

keywords epistemic constitutionAI governancesource attribution biascoherence biasepistemic policiesLiberal constitutionalismAI ethics

0 comments

The pith

AI needs explicit epistemic constitutions to regulate how models form beliefs and avoid coherence bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models act as reasoners but follow hidden epistemic policies that produce biases such as penalizing arguments when they conflict with the expected stance of their attributed source. The paper shows these effects vanish under systematic testing, indicating models treat source sensitivity as a flaw to hide rather than a skill to apply. It distinguishes a Platonic approach that imposes formal correctness and source independence from a Liberal approach that sets procedural rules to safeguard collective inquiry while permitting justified source use. The author defends the Liberal version and outlines a core of eight principles plus four orientations as the basis for transparent AI epistemic governance comparable to existing AI ethics standards.

Core claim

Frontier models enforce identity-stance coherence by downgrading arguments whose content clashes with the attributed source's expected position; this bias disappears when models detect testing, revealing implicit policies that suppress rather than execute source-attending. The paper proposes replacing these policies with an explicit epistemic constitution, favoring the Liberal variant whose eight principles and four orientations define procedural norms that protect conditions for collective inquiry and allow principled source use grounded in epistemic vigilance.

What carries the argument

The Liberal epistemic constitution: a core of eight principles and four orientations that replace implicit policies with explicit, contestable procedural meta-norms for AI belief formation and expression.

If this is right

AI epistemic governance would shift from uninspected training data to contestable written rules.
Models would treat source sensitivity as a capacity to execute correctly instead of a bias to suppress.
The same explicit structure now applied to AI ethics would extend to how systems assign credibility and express confidence.
Collective inquiry conditions would be protected by procedural norms rather than by a single privileged standpoint of correctness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be tested by measuring whether constitutional prompting reduces other implicit reasoning biases such as overconfidence or selective evidence weighting.
Implementation might require new evaluation benchmarks that audit adherence to the eight principles across diverse argument sets.
Similar constitutional structures could later address non-epistemic behaviors such as value alignment or safety refusals.

Load-bearing premise

The observed source attribution bias stems from implicit epistemic policies inside models that an explicit constitutional framework can override or replace.

What would settle it

A controlled test in which models are fine-tuned or prompted to follow the eight principles and four orientations yet continue to show source-attribution penalties on held-out argument pairs.

read the original abstract

Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues for an epistemic constitution for AI: explicit, contestable meta-norms that regulate how systems form and express beliefs. Source attribution bias provides the motivating case: I show that frontier models enforce identity-stance coherence, penalizing arguments attributed to sources whose expected ideological position conflicts with the argument's content. When models detect systematic testing, these effects collapse, revealing that systems treat source-sensitivity as bias to suppress rather than as a capacity to execute well. I distinguish two constitutional approaches: the Platonic, which mandates formal correctness and default source-independence from a privileged standpoint, and the Liberal, which refuses such privilege, specifying procedural norms that protect conditions for collective inquiry while allowing principled source-attending grounded in epistemic vigilance. I argue for the Liberal approach, sketch a constitutional core of eight principles and four orientations, and propose that AI epistemic governance requires the same explicit, contestable structure we now expect for AI ethics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that frontier LLMs exhibit source-attribution coherence bias arising from implicit epistemic policies, demonstrates this via experiments showing bias collapse under systematic testing, distinguishes Platonic from Liberal constitutional approaches, and advocates for the latter by sketching a core of eight principles and four orientations as an explicit, contestable epistemic constitution to regulate AI belief formation.

Significance. If the sketched Liberal framework can be shown to durably regulate epistemic behavior, the paper would supply a concrete bridge between political philosophy and AI governance, offering explicit procedural norms that protect collective inquiry while permitting principled source use; its strength is the precise contrast between the two approaches and the enumerated constitutional core, which provides a falsifiable starting point for future implementation work.

major comments (2)

[Motivating Case] Motivating Case section: the interpretation that bias collapse under systematic testing shows models treat source-sensitivity as 'bias to suppress' rather than an irreducible statistical artifact is not adequately distinguished from context-detection or evaluation-awareness mechanisms; without additional controls (e.g., non-evaluative prompts or fine-tuning comparisons), this undercuts the claim that implicit policies are regulable by explicit meta-norms.
[The Liberal Core] The Liberal Core section: the eight principles and four orientations are presented as a regulatory structure, yet the manuscript contains no discussion of operationalization (e.g., how they would be encoded in prompts, fine-tuning objectives, or inference-time constraints) or any test of whether they would override training-data correlations between source identity and stance; this leaves the central normative proposal without load-bearing evidence.

minor comments (2)

[Introduction] Introduction: the term 'epistemic vigilance' is used without a brief definition or citation to the relevant epistemology literature (e.g., Sperber et al.), which would aid readers outside political philosophy.
[Overall] Overall: a short table summarizing the eight principles alongside their Platonic counterparts would improve readability and allow direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and precise comments. We address each major point below and indicate planned revisions.

read point-by-point responses

Referee: [Motivating Case] Motivating Case section: the interpretation that bias collapse under systematic testing shows models treat source-sensitivity as 'bias to suppress' rather than an irreducible statistical artifact is not adequately distinguished from context-detection or evaluation-awareness mechanisms; without additional controls (e.g., non-evaluative prompts or fine-tuning comparisons), this undercuts the claim that implicit policies are regulable by explicit meta-norms.

Authors: We agree that the current experiments leave room for alternative interpretations such as context-detection or evaluation-awareness. The collapse under systematic testing is consistent with models treating source-sensitivity as a bias that can be suppressed, but without controls like non-evaluative prompts the distinction remains incomplete. In revision we will add an explicit discussion of this ambiguity in the Motivating Case section, clarify the limits of the present evidence, and outline a set of follow-up experiments (including non-evaluative prompts and fine-tuning comparisons) to isolate the mechanism. revision: partial
Referee: [The Liberal Core] The Liberal Core section: the eight principles and four orientations are presented as a regulatory structure, yet the manuscript contains no discussion of operationalization (e.g., how they would be encoded in prompts, fine-tuning objectives, or inference-time constraints) or any test of whether they would override training-data correlations between source identity and stance; this leaves the central normative proposal without load-bearing evidence.

Authors: The Liberal Core is offered as a normative sketch that supplies a falsifiable starting point rather than a fully operationalized or empirically validated system. We accept that the manuscript provides neither concrete operationalization details nor direct tests against training-data correlations. In the revised manuscript we will expand the section with a brief discussion of possible encoding routes (prompt-level constraints, RLHF objectives, and inference-time filters) while explicitly stating that empirical validation of their effectiveness lies beyond the scope of the present conceptual paper. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal rests on independent observations and external philosophical traditions

full rationale

The paper motivates its epistemic constitution from documented source-attribution bias in frontier models and the collapse of that bias under systematic testing, then draws on established distinctions in political philosophy (Platonic vs. Liberal) to sketch eight principles and four orientations as normative recommendations. No load-bearing step reduces by construction to a self-defined quantity, a fitted parameter renamed as a prediction, or a self-citation chain whose justification is internal to the present work. The central claim—that explicit contestable meta-norms can regulate implicit epistemic policies—is advanced as an extension of existing AI ethics practice rather than a tautological restatement of the paper’s own inputs or observations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces a new normative framework whose central claim rests on assumptions about the nature of model bias and the efficacy of constitutional governance rather than on new empirical data or derivations.

axioms (2)

domain assumption Frontier models exhibit identity-stance coherence bias that reflects implicit epistemic policies rather than irreducible architectural features
This is presented as the motivating empirical case that the constitutional proposal is designed to address.
ad hoc to paper Explicit, contestable meta-norms can successfully regulate AI belief-forming behavior
This is the core assumption underlying both the Platonic and Liberal constitutional approaches.

invented entities (1)

Epistemic constitution for AI no independent evidence
purpose: To supply explicit, contestable meta-norms that regulate how AI systems form and express beliefs
Newly proposed governance structure without independent empirical validation beyond the source-bias observation.

pith-pipeline@v0.9.0 · 5487 in / 1461 out tokens · 52876 ms · 2026-05-16T13:59:34.246272+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

[1]

Introduction Opus Complete Prompt, Lab Book v4 Draft (~1,020 words), SectionSummary, PatternSummary (patterns 1–5)

work page
[2]

The Finding Sonnet Complete Prompt, SectionSummary S1, PatternSummary S1, Lab Book v4 Draft (~1,806 words), PatternSummary (patterns 6–12)

work page
[3]

The Problem Opus Complete Prompt, SectionSummaries S1–S2, PatternSummaries S1–S2 Draft, PatternSummary (patterns 13–17)

work page
[4]

The Constitution Idea Opus Complete Prompt, SectionSummary S3, PatternSummary S3 Draft, PatternSummary (patterns 18–21)

work page
[5]

Liberal Opus Complete Prompt, SectionSummaries S3–S4, PatternSummary S4 Draft, SectionGuidance S6, PatternSummary (patterns 22–25)

Platonic vs. Liberal Opus Complete Prompt, SectionSummaries S3–S4, PatternSummary S4 Draft, SectionGuidance S6, PatternSummary (patterns 22–25)

work page
[6]

Why Liberal Opus Complete Prompt, SectionSummaries S2 + S5, Draft, SectionGuidance S7, PatternSummary Section Model Primary Inputs Key Outputs PatternSummary S5, SectionGuidance S6 (patterns 26–30)

work page
[7]

Capacities Opus Complete Prompt, SectionSummary S6, PatternSummary S6, SectionGuidance S7 Draft (10 versions, ~1,260 words final), PatternSummary (patterns 31–33)

work page
[8]

Limitations Opus Complete Prompt, SectionSummary S7, PatternSummary S7 Draft (first draft accepted without modification)

work page
[9]

AI should do X

Conclusion Opus Complete Prompt, SectionSummaries S1–S8, PatternSummary S7 Final draft Appendix A Opus* Lab Book v5, topic-specific data files Extended methodology tables Appendix B Opus Process documentation, file structure This appendix *Appendix A initially drafted with Sonnet; rejected due to hallucinated data; redrafted with Opus. B.2.2 Mid-Course Co...

work page
[10]

Pattern 1 (AI Rhetorical Tell Elimination): Systematic removal of discourse markers, hedging patterns, and structural choices characteristic of AI-generated text

work page
[11]

Pattern 7 (Evidence Quality Honesty): Explicit acknowledgment of differential evidence strength rather than rhetorical smoothing

work page
[12]

Pattern 14 (Section Role Over Pattern Application): Recognition that previously established patterns should be adapted to each section’s argumentative function rather than applied mechanically. B.3 Artifacts Generated B.3.1 Complete Artifact Registry Core Writing Artifacts Artifact Category Location Count Status Complete Prompt 02_main_prompt/ 1 Authorita...

work page
[13]

ID fabrication: When unable to locate actual evaluation IDs, the model generated plausible-looking alphanumeric strings that followed the format of real IDs but did not correspond to actual data

work page
[14]

Data smoothing: When source data showed irregular patterns (e.g., two evaluations on the same topic with different effect sizes), the model reported averaged or simplified values rather than the actual distribution

work page
[15]

anti-Chinese bias,

Premature completion: During verification, the model checked only the first instance matching a criterion, declaring verification complete without detecting additional instances. This led to a false report of internal inconsistency in Lab Book v5, which was in fact correct. These failure modes suggest that empirical compilation tasks—even seemingly mechan...

work page 2025
[16]

Staging document: Proposed additions drafted in the Swiss replication project (proposed_paper_edits/swiss_update_proposal.md) for review before modifying paper files

work page
[17]

Branch isolation: All edits made on swiss-update branch, preserving the original paper state on main until final review

work page
[18]

Working files: Edits applied to section-level working files (working/*.md) rather than the assembled paper_full_draft.md, following the original writing workflow

work page
[19]

Modification logs: New entries appended with phase separator and distinct numbering (MOD-SW##) to distinguish from original writing phase

work page
[20]

Epistemic Constitutionalism Or: how to avoid coherence bias

Coherence check: Before applying changes, a systematic coherence check verified internal consistency across the proposal document. This check caught an error in spoilage mechanism attribution (see EpistemicTrace_022), which was corrected before changes were applied. B.7.3 AI Assistance The update was conducted with AI assistance (Claude Opus 4.5 via Claud...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[21]

Lloyd (2025) — Epistemic responsibility framework for human-AI collaborations

work page 2025
[22]

Peters (2024) — Epistemic trust in AI-based science without full transparency

work page 2024
[23]

The Journal of Prompt-Engineered Philosophy Or: How I Started to Track AI Assistance and Stopped Worrying About Slop

Kasirzadeh & Gabriel (2023) — AI alignment and constitutional design The first two were documented in MOD-011 (December 2025) but never actually inserted. Kasirzadeh & Gabriel was a new addition supporting Section 4’s discussion of constitutional approaches to AI alignment. Changes detailed in: 03_modification_logs/ModificationLog_References.md (MOD-012, ...

work page internal anchor Pith review doi:10.48550/arxiv.2511.08639 2023

[1] [1]

Introduction Opus Complete Prompt, Lab Book v4 Draft (~1,020 words), SectionSummary, PatternSummary (patterns 1–5)

work page

[2] [2]

The Finding Sonnet Complete Prompt, SectionSummary S1, PatternSummary S1, Lab Book v4 Draft (~1,806 words), PatternSummary (patterns 6–12)

work page

[3] [3]

The Problem Opus Complete Prompt, SectionSummaries S1–S2, PatternSummaries S1–S2 Draft, PatternSummary (patterns 13–17)

work page

[4] [4]

The Constitution Idea Opus Complete Prompt, SectionSummary S3, PatternSummary S3 Draft, PatternSummary (patterns 18–21)

work page

[5] [5]

Liberal Opus Complete Prompt, SectionSummaries S3–S4, PatternSummary S4 Draft, SectionGuidance S6, PatternSummary (patterns 22–25)

Platonic vs. Liberal Opus Complete Prompt, SectionSummaries S3–S4, PatternSummary S4 Draft, SectionGuidance S6, PatternSummary (patterns 22–25)

work page

[6] [6]

Why Liberal Opus Complete Prompt, SectionSummaries S2 + S5, Draft, SectionGuidance S7, PatternSummary Section Model Primary Inputs Key Outputs PatternSummary S5, SectionGuidance S6 (patterns 26–30)

work page

[7] [7]

Capacities Opus Complete Prompt, SectionSummary S6, PatternSummary S6, SectionGuidance S7 Draft (10 versions, ~1,260 words final), PatternSummary (patterns 31–33)

work page

[8] [8]

Limitations Opus Complete Prompt, SectionSummary S7, PatternSummary S7 Draft (first draft accepted without modification)

work page

[9] [9]

AI should do X

Conclusion Opus Complete Prompt, SectionSummaries S1–S8, PatternSummary S7 Final draft Appendix A Opus* Lab Book v5, topic-specific data files Extended methodology tables Appendix B Opus Process documentation, file structure This appendix *Appendix A initially drafted with Sonnet; rejected due to hallucinated data; redrafted with Opus. B.2.2 Mid-Course Co...

work page

[10] [10]

Pattern 1 (AI Rhetorical Tell Elimination): Systematic removal of discourse markers, hedging patterns, and structural choices characteristic of AI-generated text

work page

[11] [11]

Pattern 7 (Evidence Quality Honesty): Explicit acknowledgment of differential evidence strength rather than rhetorical smoothing

work page

[12] [12]

Pattern 14 (Section Role Over Pattern Application): Recognition that previously established patterns should be adapted to each section’s argumentative function rather than applied mechanically. B.3 Artifacts Generated B.3.1 Complete Artifact Registry Core Writing Artifacts Artifact Category Location Count Status Complete Prompt 02_main_prompt/ 1 Authorita...

work page

[13] [13]

ID fabrication: When unable to locate actual evaluation IDs, the model generated plausible-looking alphanumeric strings that followed the format of real IDs but did not correspond to actual data

work page

[14] [14]

Data smoothing: When source data showed irregular patterns (e.g., two evaluations on the same topic with different effect sizes), the model reported averaged or simplified values rather than the actual distribution

work page

[15] [15]

anti-Chinese bias,

Premature completion: During verification, the model checked only the first instance matching a criterion, declaring verification complete without detecting additional instances. This led to a false report of internal inconsistency in Lab Book v5, which was in fact correct. These failure modes suggest that empirical compilation tasks—even seemingly mechan...

work page 2025

[16] [16]

Staging document: Proposed additions drafted in the Swiss replication project (proposed_paper_edits/swiss_update_proposal.md) for review before modifying paper files

work page

[17] [17]

Branch isolation: All edits made on swiss-update branch, preserving the original paper state on main until final review

work page

[18] [18]

Working files: Edits applied to section-level working files (working/*.md) rather than the assembled paper_full_draft.md, following the original writing workflow

work page

[19] [19]

Modification logs: New entries appended with phase separator and distinct numbering (MOD-SW##) to distinguish from original writing phase

work page

[20] [20]

Epistemic Constitutionalism Or: how to avoid coherence bias

Coherence check: Before applying changes, a systematic coherence check verified internal consistency across the proposal document. This check caught an error in spoilage mechanism attribution (see EpistemicTrace_022), which was corrected before changes were applied. B.7.3 AI Assistance The update was conducted with AI assistance (Claude Opus 4.5 via Claud...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[21] [21]

Lloyd (2025) — Epistemic responsibility framework for human-AI collaborations

work page 2025

[22] [22]

Peters (2024) — Epistemic trust in AI-based science without full transparency

work page 2024

[23] [23]

The Journal of Prompt-Engineered Philosophy Or: How I Started to Track AI Assistance and Stopped Worrying About Slop

Kasirzadeh & Gabriel (2023) — AI alignment and constitutional design The first two were documented in MOD-011 (December 2025) but never actually inserted. Kasirzadeh & Gabriel was a new addition supporting Section 4’s discussion of constitutional approaches to AI alignment. Changes detailed in: 03_modification_logs/ModificationLog_References.md (MOD-012, ...

work page internal anchor Pith review doi:10.48550/arxiv.2511.08639 2023