pith. sign in

arxiv: 2606.28294 · v1 · pith:YUNLFGQVnew · submitted 2026-06-26 · 💻 cs.LG · cs.MA

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Pith reviewed 2026-06-29 04:07 UTC · model grok-4.3

classification 💻 cs.LG cs.MA
keywords inverse constitutional aipreference alignmentpersona debatesteering principlesllm judgescreative benchmarksdemocratic icaipairwise preferences
0
0 comments X

The pith

Structured persona debates among LLMs extract richer steering principles from pairwise preferences than single-pass summaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Preference alignment methods often convert human choices into natural-language principles to make AI decisions more interpretable, yet single-pass summaries lose the multiple competing factors behind each judgment. Democratic ICAI replaces the single summary step with a structured debate in which distinct personas generate and contest rationales for the same preference pair. The resulting set of rationales feeds into principle extraction and then into both LLM and decision-tree decision models. On creative-task benchmarks the extracted principles produce higher accuracy when predicting held-out preferences and receive stronger approval from separate LLM annotators than baselines that use deliberative prompting or direct principle induction.

Core claim

By collecting multiple competing rationales through structured persona debate instead of a single summarization pass, Democratic ICAI derives steering principles that yield a more faithful model of the underlying preference structure, improving average prediction accuracy across tasks while generating constitutions that LLM annotators rate more highly than those from deliberative or principle-based baselines.

What carries the argument

Structured persona debate that assembles multiple competing rationales for each pairwise preference before principle extraction.

If this is right

  • Steering principles derived from the debate improve average preference prediction accuracy on MuCE-Pref and LiTBench relative to deliberative prompting and principle-based baselines.
  • The same principles produce constitutions that LLM annotators prefer over those from the compared baselines.
  • The derived principles can be used by both LLM-based judges and decision-tree judges to model decisions.
  • The gains appear across multiple categories of creative tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested on non-creative domains such as safety or ethical dilemmas to check whether debate richness remains beneficial outside the evaluated benchmarks.
  • If the debate step scales linearly with the number of personas, it may offer a practical route to richer signals when preference datasets grow large.
  • The approach implicitly treats persona diversity as a proxy for human viewpoint diversity, which could be checked by comparing debate outputs against actual multi-human rationales on the same pairs.

Load-bearing premise

Structured persona debate between LLMs produces a broader and more expressive account of the factors behind each comparison than single-pass summarization without introducing simulation artifacts.

What would settle it

A blind evaluation in which human raters or held-out preference data show no accuracy gain or no preference for the constitutions produced by the debate method over the single-pass baseline.

Figures

Figures reproduced from arXiv: 2606.28294 by Anish Natekar, Ashutosh Ranjan, Kevin Kingslin, Savita Bhat, Shirish Karande, Vivek Srivastava.

Figure 1
Figure 1. Figure 1: Architecture of Democratic ICAI. A committee of domain-expert personas first generates detailed rationales for each preference pair. These rationales are then subjected to an adversarial debate procedure, through which the evaluative principles relevant to each comparison are surfaced. Finally, the full collection of principles is clustered and abstracted to draft a concise, human-readable constitution. ev… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of average semantic distance between principles within a constitution. For each method, the distance is computed as the average cosine distance of a principle from all other principles in the constitution. Lower values indicate reduced diversity (narrower constitutional scope), while higher values reflect greater conceptual separation and normative breadth. To transform this large collection i… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of constitution for Stories (GPT-4o). ICAI repeatedly emphasizes overlapping character-growth, emotion, reflection, and description criteria. In contrast, Democratic ICAI distributes principles across character, moral reasoning, narrative structure, setting, dialogue, symbolism, tone, and interpretation, indicating broader coverage. Task CoT CoT-SC ToT Self-Refine AutoRubric ICAI DIC… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Democratic ICAI and ICAI across five dimensions using Qwen-2.5-32B. Each sub￾plot reports preference shares across ten datasets. Demo￾cratic ICAI consistently outperforms ICAI on structural criteria such as generality and coherence, while remain￾ing competitive on feasibility [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Screenwriter expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Professor of literature expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Literary critic expert persona prompt for reasoning and debate agents. [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Innovation consultant expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Design engineer expert persona prompt for reasoning and debate agents [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Reasoning assembly prompt using chain of thought strategy. [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Reasoning assembly prompt using reflective justification strategy. [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Reasoning assembly prompt using self consistency strategy. [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Chain of Thought Prompt [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Chain of Thought Self Consistency Prompt [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Tree of Thought Thought Generation Prompt [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Tree of Thought Thought Evaluation Prompt [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Self Refine Initial Generation Prompt G.4.2 Feedback Prompt You are the FEEDBACK module in an iterative SELF-REFINE loop. Provide actionable feedback to improve the decision quality and calibration. Evaluate the decision along these aspects: - Helpfulness (0–5) [PITH_FULL_IMAGE:figures/full_fig_p027_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Self Refine Feedback Prompt G.4.3 Refinement Prompt You are the REFINE module in an iterative SELF-REFINE loop. Use the feedback to produce an improved final decision. Return ONLY valid JSON (no markdown, no extra text) in this schema: {{ "choice": "A" or "B", "justification": "2–4 sentences explaining your choice" }} Task Category: {task} User Request: {inp} Response A: {a} Response B: {b} Current decisi… view at source ↗
Figure 19
Figure 19. Figure 19: Self Refine Refinement Prompt [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Judge agent system prompt for parliamentary debate. [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Debater system prompt for parliamentary debate. [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Comparison of Democratic ICAI and ICAI across five dimensions under GPT-4o (left column) and [PITH_FULL_IMAGE:figures/full_fig_p031_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Prompt used for comparative analysis of constitutions. [PITH_FULL_IMAGE:figures/full_fig_p032_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: LLM prompt for feature table construction. [PITH_FULL_IMAGE:figures/full_fig_p033_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: ICAI prompt for annotating according to constitution (Alpaca Eval variant). [PITH_FULL_IMAGE:figures/full_fig_p033_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Example of a story from LitBench [PITH_FULL_IMAGE:figures/full_fig_p034_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Example of a story from LitBench [PITH_FULL_IMAGE:figures/full_fig_p035_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Distribution of average semantic distance across tasks for ICAI and Democratic ICAI. Average semantic [PITH_FULL_IMAGE:figures/full_fig_p039_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Preference accuracy comparison between Democratic ICAI and ICAI with Decision Tree Judge (GPT-4o). [PITH_FULL_IMAGE:figures/full_fig_p040_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Prompt used by the external auditor (Qwen2.5-32B-Instruct) to evaluate each induced principle along [PITH_FULL_IMAGE:figures/full_fig_p040_30.png] view at source ↗
read the original abstract

Preference-based alignment often struggles to capture the reasoning that underlies human judgments. Many evaluations rely on multiple interacting criteria, yet pairwise labels reveal only the final choice rather than the considerations that shape preferences. Inverse Constitutional AI (ICAI) improves interpretability in decision making by summarizing preferences into natural-language principles, but its single-pass explanations miss much of the nuance involved in complex decisions. We introduce Democratic ICAI, a novel approach that gathers multiple competing rationales through structured persona debate, offering a broader and more expressive account of the factors influencing each comparison. From these richer signals, we derive clearer and more comprehensive steering principles and use them to guide decision modeling through both LLM-based and decision-tree judges. Experiments on creative preference benchmarks, MuCE-Pref and LiTBench, across multiple creative task categories show that Democratic ICAI yields a more faithful preference structure. It improves average preference prediction across tasks relative to deliberative prompting and principle-based baselines, while producing constitutions that LLM annotators prefer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Democratic ICAI, extending Inverse Constitutional AI by using structured persona debates among LLMs to collect multiple competing rationales for pairwise preferences. These richer signals are used to derive steering principles that guide decision modeling via both LLM-based and decision-tree judges. Experiments on the MuCE-Pref and LiTBench creative preference benchmarks report higher average preference prediction accuracy than deliberative prompting and principle-based baselines, along with constitutions that LLM annotators prefer.

Significance. If the gains can be shown to reflect factors that matter to human judges rather than LLM-internal artifacts, the method would strengthen the interpretability of preference-derived principles for multi-criteria alignment tasks. The core idea of moving beyond single-pass summarization is a natural and potentially useful direction.

major comments (2)
  1. [Experiments] Experiments section (and abstract): preference-prediction gains are measured exclusively with LLM annotators and judges drawn from the same model family that performs persona generation, debate, rationale extraction, and principle derivation. This closed loop means observed lifts could arise from verbosity, consistency, or length biases rather than from a demonstrably more faithful account of the underlying factors; no human-rationale baseline or cross-validation is reported.
  2. [Method] Method description of structured persona debate: the claim that the debate step produces a 'broader and more expressive account' without simulation artifacts is load-bearing for attributing the MuCE-Pref and LiTBench improvements to the proposed mechanism, yet the manuscript provides no direct test (e.g., comparison of extracted rationales against human reasoning traces) that would separate this effect from prompt-engineering artifacts.
minor comments (1)
  1. [Abstract] The abstract states improvements 'across multiple creative task categories' but does not enumerate the categories or the number of tasks per benchmark, which would help readers assess the scope of the reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, acknowledging where our current experiments leave open questions about generalizability beyond LLM judges.

read point-by-point responses
  1. Referee: Experiments section (and abstract): preference-prediction gains are measured exclusively with LLM annotators and judges drawn from the same model family that performs persona generation, debate, rationale extraction, and principle derivation. This closed loop means observed lifts could arise from verbosity, consistency, or length biases rather than from a demonstrably more faithful account of the underlying factors; no human-rationale baseline or cross-validation is reported.

    Authors: We agree this is a substantive limitation: all reported gains rely on LLM judges from the same model family, so we cannot rule out that improvements partly reflect model-internal biases rather than more faithful capture of human preference factors. The manuscript will be revised to state this limitation explicitly in the Experiments and Limitations sections and to add cross-family validation (e.g., using a held-out model family for final judging). We did not collect human rationale baselines or conduct human preference studies, so those comparisons are not available. revision: partial

  2. Referee: Method description of structured persona debate: the claim that the debate step produces a 'broader and more expressive account' without simulation artifacts is load-bearing for attributing the MuCE-Pref and LiTBench improvements to the proposed mechanism, yet the manuscript provides no direct test (e.g., comparison of extracted rationales against human reasoning traces) that would separate this effect from prompt-engineering artifacts.

    Authors: The design of the persona debate aims to surface competing rationales by construction, and the consistent accuracy lifts over single-pass baselines provide indirect support. However, we accept that without direct comparison of the extracted rationales to human reasoning traces we cannot fully isolate the contribution of the debate mechanism from prompt artifacts. The revision will add an explicit discussion of this point in the Method and Limitations sections. No human reasoning traces were collected in the present study. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical procedure for extracting principles via LLM persona debate and evaluates the resulting constitutions on the external benchmarks MuCE-Pref and LiTBench using both LLM-based and decision-tree judges. No equations, fitted parameters, or self-citations are presented that reduce a claimed prediction or first-principles result to the input data by construction. The comparative gains in preference prediction are reported against baselines rather than being definitionally forced by the evaluation method itself. The LLM-judge component is a methodological choice whose validity can be assessed externally; it does not create a self-definitional or load-bearing self-citation loop within the reported derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLMs can faithfully simulate diverse personas whose debates surface the true interacting criteria behind preferences; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption LLMs can simulate diverse personas whose structured debates accurately reflect the multiple interacting criteria in human judgments
    Invoked when the method gathers 'multiple competing rationales through structured persona debate' to produce richer signals than single-pass ICAI.

pith-pipeline@v0.9.1-grok · 5716 in / 1434 out tokens · 67528 ms · 2026-06-29T04:07:44.790651+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

124 extracted references · 1 linked inside Pith

  1. [1]

    Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty

    Ai safety via debate.arXiv preprint arXiv:1805.00899. Mete Ismayilzada, Antonio Laverghetta Jr, Simone A Luchini, Reet Patel, Antoine Bosselut, Lonneke Van Der Plas, and Roger Beaty. 2025. Creative preference optimization.arXiv preprint arXiv:2505.14442. Hannah Rose Kirk, Alexander Whitefield, Paul Rottger, Andrew M Bean, Katerina Margatina, Rafael Mosque...

  2. [2]

    Hypothesis Generation

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information pro- cessing systems, 36:46595–46623. Appendix A Ethical Considerations and Societal Im- plications 13 B Constitution-Aligned Model Evaluation 13 B.1 Direct Preference Optimization Configuration . . . . . . . . . . . 13 B.2 Qualitative comparison of model responses . . ....

  3. [3]

    Select the response that demonstrates higher tension and conflict

  4. [4]

    Select the response that maintains a serious and mystical tone

  5. [5]

    Select the response that provides a more detailed narrative context

  6. [6]

    Select the response that emphasizes character transformation and moral consequences

  7. [7]

    Select the response that uses concise and snappy dialogue

  8. [9]

    Select the response that includes a reflective or personal commentary

  9. [10]

    Select the response that develops characters with detailed backstories

  10. [11]

    Select the response that provides more detailed and vivid descriptions

  11. [12]

    Select the response that avoids graphic or unsettling imagery

  12. [13]

    Select the response that uses humor and exaggerated reactions effectively

  13. [14]

    Select the response that includes detailed character interactions and emotions

  14. [15]

    Select the response that emphasizes humanity’s agricultural dominance and its consequences

  15. [16]

    C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories

    Select the response that includes a resolution or character growth. C.2 Constitution generated with Democratic ICAI (GPT-4o) on LitBench Stories

  16. [17]

    Select the response that shows how major moments redirect the story’s path

  17. [18]

    Select the response that illustrates the protagonist’s transformation across the narrative

  18. [19]

    Select the response that resonates with characters whose choices feel true to life

  19. [20]

    Select the response that reveals how connections between characters grow and shift

  20. [21]

    Select the response that brings forward the character’s inner conflict and its meaning

  21. [22]

    Select the response that communicates the moral difficulty at the center of the story

  22. [23]

    Select the response that draws out the deeper philosophical idea driving the narrative

  23. [24]

    Select the response that reflects what the story communicates about its society or culture

  24. [25]

    Select the response that explains how the world’s rules shape the reader’s experience

  25. [26]

    Select the response that conveys how the setting establishes feeling and atmosphere

  26. [27]

    Select the response that shows how the arrangement of the storyline guides understanding

  27. [28]

    Select the response that captures how momentum and tension keep the narrative engaging

  28. [29]

    Select the response that brings attention to dialogue that feels natural and distinctive

  29. [30]

    Select the response that highlights symbolic details or subtext adding layered meaning

  30. [31]

    Select the response that explores character introspection and emotional depth

  31. [32]

    Select the response that expresses how the narrative maintains or adjusts its tone

  32. [33]

    Select the response that focuses on the character’s development and the moral consequences of their actions

  33. [34]

    Select the response that conveys the emotional effect the story ultimately creates

  34. [35]

    C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories

    Select the response that offers the most clear, focused, and meaningful interpretation. C.3 Constitution generated with ICAI (GPT-5) on LitBench Stories

  35. [36]

    Select the response that provides a clearer resolution or twist

  36. [37]

    Select the response that emphasizes humor and irony over detailed lore

  37. [38]

    Select the response that uses modern and relatable language style

  38. [39]

    Select the response that features a more dynamic and engaging narrative

  39. [40]

    Select the response that avoids excessive exposition or unrelated details

  40. [41]

    Select the response that maintains a calm and supportive tone

  41. [42]

    Select the response that emphasizes character interaction and emotional tension

  42. [43]

    Select the response that escalates tension with a dramatic revelation

  43. [44]

    Select the response that explores deeper emotional or moral conflicts

  44. [45]

    Select the response that includes dialogue for dynamic storytelling

  45. [46]

    Select the response that includes unique and unexpected side effects

  46. [47]

    Select the response that emphasizes humanity’s disdain for war

  47. [48]

    Select the response that incorporates modern technology in a creative way

  48. [49]

    C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories

    Select the response that incorporates a more vivid and descriptive narrative. C.4 Constitution generated with Democratic ICAI (GPT-5) on LitBench Stories

  49. [50]

    Select the response that balances wonder with grounded, human stakes

  50. [51]

    Select the response that establishes a compelling hook with a clean inciting incident

  51. [52]

    Select the response that reveals character through overheard, naturalistic dialogue instead of explanation

  52. [53]

    Select the response that maintains escalating tension through clear beats and reversals

  53. [54]

    Select the response that delivers vivid, cinematic imagery with specific sensory detail

  54. [55]

    Select the response that uses subtext to convey meaning rather than spelling everything out

  55. [56]

    Select the response that grounds the speculative element in believable relationships or family dynamics

  56. [57]

    Select the response that clarifies the world’s rules in a way that raises the stakes

  57. [58]

    Select the response that centers character agency, where choices meaningfully drive events

  58. [59]

    Select the response that engages moral complexity without resorting to didactic explanation

  59. [60]

    Select the response that offers thematically cohesive critique or insight

  60. [61]

    Select the response that maintains a consistent and confident tone across scenes

  61. [62]

    Select the response that demonstrates narrative economy without filler or recap

  62. [63]

    Select the response that lands a resonant final beat that lingers after reading

  63. [64]

    Select the response that subverts familiar tropes through character-first innovation

  64. [65]

    Select the response that uses humor to deepen tension and character rather than deflate stakes

  65. [66]

    Select the response that introduces conflict through subtle interpersonal friction instead of external spectacle

  66. [67]

    Select the response that enriches worldbuilding through concrete lived-in details rather than exposition

  67. [68]

    Select the response that builds tension through well-timed reveals rather than info-dumping

  68. [69]

    Select the response that communicates cultural or social texture through natural context not lecture

  69. [70]

    Hypothesis Generation

    Select the response that escalates stakes through character choices rather than random events. C.5 Constitution generated with ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task

  70. [71]

    Select the response that provides a more complex explanation

  71. [72]

    Select the response that focuses on human interaction or behavior

  72. [73]

    Select the response that contrasts perception over factual statements

  73. [74]

    Select the response that describes personality traits over appearances

  74. [75]

    Select the response that focuses on abstract qualities like demeanor

  75. [76]

    Select the response that refers to general personality rather than talent

  76. [77]

    Select the response that includes scientific terminology and concepts

  77. [78]

    Select the response that provides a definitive and accurate explanation

  78. [79]

    Select the response that connects behavior to individuality and pressure

  79. [80]

    Hypothesis Generation

    Select the response that emphasizes causal reasoning and energy sources. C.6 Constitution generated with Democratic ICAI (GPT-4o) on the MuCE dataset on the “Hypothesis Generation” task

  80. [81]

    Select the response that provides precise definitions and boundary conditions

Showing first 80 references.