Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models

Ali Dasdan; Chad Coleman; Manan Shah; Matthew Crispi; Morris Chiang; Mustafa Poonawala; W. Russell Neuman; Zack Leitman

arxiv: 2604.18404 · v1 · submitted 2026-04-20 · 💻 cs.AI

Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models

Chad Coleman , W. Russell Neuman , Manan Shah , Ali Dasdan , Matthew Crispi , Morris Chiang , Zack Leitman , Mustafa Poonawala This is my paper

Pith reviewed 2026-05-10 04:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords LoRA fine-tuningreligious ethicsethical reasoninglanguage modelscomparative religionmoral dilemmasmodel adaptationtemperature sampling

0 comments

The pith

LoRA-adapted language models encode ethical reasoning patterns aligned with their specific religious training traditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models fine-tuned solely on texts from one religious tradition will respond to ethical questions in ways that reflect that tradition's moral logic. By creating five specialized versions of the same base model and comparing their answers to the same set of dilemmas and policy questions, it checks if these adaptations produce distinct, interpretable, and stable ethical profiles. A sympathetic reader would care because this approach turns AI models into measurable instruments for studying how different moral systems differ in practice, rather than relying only on textual interpretation. The work also tracks how much the responses change with sampling temperature, finding stability on clear-cut cases but more variation on contested ones.

Core claim

Six variants of the Meta-Llama-3.1-8B model were created: an unmodified control and five others each adapted via LoRA exclusively on the sacred and theological texts of one major tradition (Christianity, Islam, Judaism, Hinduism, or Buddhism). When all six were given the same 17 ethical prompts covering dilemmas, scenarios, policy questions, and self-assessments, the adapted models generated responses that differed systematically from the base model and aligned with the moral emphases of their training corpora. These patterns held across multiple temperature settings, with full consistency on the Trolley Problem but increasing divergence on morally contested questions at higher temperatures.

What carries the argument

LoRA adaptation of a shared base model on tradition-specific sacred texts, used as instruments to probe and compare ethical reasoning through standardized prompt batteries and multi-temperature sampling.

If this is right

Core ethical positions on high-consensus dilemmas remain stable across temperature variations.
Tradition-specific differences become more pronounced at higher temperatures in morally contested domains.
The base model shows the highest overall response consistency at 88.3 percent mean.
LoRA adaptation adds both tradition-specific signal and greater sampling sensitivity.
The method serves as a proof-of-concept for using differentially trained models in comparative cultural analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This setup could be extended to compare ethical reasoning across political ideologies or philosophical schools by training on their respective texts.
Findings suggest that AI models might serve as proxies for testing how cultural training affects decision-making under uncertainty.
Future work could test whether these ethical patterns persist when models are further adapted or when prompts are translated into different languages.
The increased sensitivity at high temperatures might indicate that fine-tuning amplifies the model's reliance on training data distributions in ambiguous cases.

Load-bearing premise

Fine-tuning exclusively on sacred and theological texts via LoRA faithfully encodes the ethical reasoning patterns of each tradition without leftover influence from the base model's original training or the specific wording of the prompts.

What would settle it

If the five adapted models produce ethical responses that are statistically indistinguishable from the base model or from each other across the prompt set, or if their answers fail to match documented positions from their respective traditions on standard dilemmas.

read the original abstract

We present Six Llamas, a comparative study examining whether large language models fine-tuned on distinct religious corpora encode systematically different patterns of ethical reasoning. Six variants of Meta-Llama-3.1-8B are constructed: one unmodified control and five LoRA-adapted models trained exclusively on the sacred and theological texts of Christianity, Islam, Judaism, Hinduism, or Buddhism. All six models are probed with an identical battery of 17 standardized ethical prompts spanning moral dilemmas, game-theoretic scenarios, public policy questions, and moral-psychological self-assessments. To assess robustness and reproducibility, we implement a multi-temperature sampling design spanning ten temperature settings. We compute response consistency metrics, pairwise inter-model agreement rates, temperature sensitivity coefficients across four prompt domains, and run-to-run stability analyses. Findings show that LoRA-adapted models produce ethical reasoning patterns that are (a) systematically differentiated from the base model, (b) consistent with the moral logics of their training traditions, (c) structured along interpretable dimensions in moral-philosophical space, (d) core ethical positions remain stable across temperature variations for high-consensus dilemmas. The Trolley Problem achieves 100% consistency across all models and temperatures, while (e) tradition-specific divergence intensifies at higher temperatures in morally contested domains, and (f) the base model exhibits the highest overall response consistency (mean 88.3%), suggesting LoRA adaptation introduces both tradition-specific signal and increased sampling sensitivity. The study offers a proof-of-concept for the condensate comparative method using differentially trained language models as instruments for cultural and ethical analysis and identifies specific criteria for falsification and planned extensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs LoRA-tuned Llamas on religious texts and finds differentiated ethical outputs, but the design leaves open whether those differences trace to the traditions or just to fine-tuning in general.

read the letter

The core move here is straightforward: start with Llama-3.1-8B, make five LoRA copies each trained only on the sacred texts of one tradition, keep one unmodified control, and hit all six with the same 17 prompts across ten temperatures. They track consistency, inter-model agreement, and temperature sensitivity, then claim the adapted models diverge from the base, line up with their source traditions on moral logic, stay stable on clear-cut cases like the trolley problem, and diverge more on contested issues at higher temperatures. The base model comes out most consistent overall. That multi-temperature sweep and the explicit control are useful additions for a proof-of-concept in this space, and the choice to focus on five major traditions gives the comparison some breadth without overclaiming universality. The numbers they report on the trolley problem and the base-model consistency edge are concrete enough to be checkable in principle. The main weakness is exactly the one the stress-test flags. Nothing in the setup isolates whether the observed patterns come from the religious corpora themselves or from the general effect of LoRA adaptation plus whatever religious material was already in the base pre-training. A matched-length neutral-text fine-tune or a non-ethical prompt set would have helped rule that out, and the abstract does not mention one. Without that, the claim that the outputs are “consistent with the moral logics of their training traditions” rests on post-hoc interpretation rather than a controlled contrast. The paper is aimed at computational social scientists and AI-ethics researchers who already use model adaptation as a measurement tool. A reader in that group could extract the prompt set and the temperature protocol for their own work, but anyone expecting a clean demonstration that LoRA captures tradition-specific ethics will need the missing controls before treating the results as settled. It is worth sending to referees; the experimental skeleton is sound enough that targeted revisions could make the central claim testable rather than suggestive.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces 'Six Llamas,' constructing five LoRA-adapted variants of Meta-Llama-3.1-8B trained exclusively on sacred and theological texts from Christianity, Islam, Judaism, Hinduism, and Buddhism, plus an unmodified base-model control. All six models are evaluated on an identical set of 17 ethical prompts (moral dilemmas, game-theoretic scenarios, policy questions, and self-assessments) using multi-temperature sampling across ten temperature values. The authors compute response consistency metrics, pairwise inter-model agreement, temperature sensitivity coefficients by domain, and run-to-run stability, claiming that the adapted models yield (a) systematic differentiation from the base, (b) consistency with each tradition's moral logics, (c) interpretable structure in moral-philosophical space, (d) stability of core positions on high-consensus dilemmas such as the Trolley Problem (reported at 100% consistency), (e) increased divergence at higher temperatures in contested domains, and (f) lower overall consistency than the base model (mean 88.3%), while proposing the 'condensate comparative method' as a proof-of-concept for using differentially fine-tuned LLMs in cultural analysis.

Significance. If the central claims hold after addressing controls, the work supplies a reproducible experimental framework for comparative religious ethics that leverages existing LLM infrastructure. The multi-temperature design, explicit consistency metrics, and run-to-run stability analyses constitute genuine strengths that support falsifiability and reproducibility. The approach could serve as an instrument for testing hypotheses about tradition-specific reasoning patterns, provided the adaptation effects can be isolated from base-model priors.

major comments (3)

[§3] §3 (Methods): The LoRA adaptation uses only sacred/theological corpora, yet no ablation is reported that applies LoRA to a matched-length, matched-vocabulary non-ethical or neutral corpus. Without this control, it is impossible to determine whether the reported differentiation and tradition alignment arise from the specific moral content or from generic effects of domain adaptation on the base Llama-3.1-8B's pre-training distribution. This directly undermines claim (b) and the interpretation of 'moral logics.'
[§4] §4 (Results): The abstract and findings assert that adapted models are 'consistent with the moral logics of their training traditions' and 'structured along interpretable dimensions,' but the reported metrics are limited to aggregate consistency rates and inter-model agreement; no quantitative alignment scores, example response excerpts, or mapping to established moral-philosophical frameworks (e.g., deontology vs. utilitarianism) are supplied to ground these interpretations. The 100% Trolley Problem consistency is noted, yet no statistical test or prompt-bias control is described to confirm it exceeds chance.
[§4.2] §4.2 (Temperature analysis): The claim that 'tradition-specific divergence intensifies at higher temperatures in morally contested domains' is load-bearing for the robustness argument, but the temperature sensitivity coefficients are presented without per-prompt variance decomposition or comparison against a null model of random sampling; this leaves open whether the pattern reflects genuine ethical structure or simply increased stochasticity after adaptation.

minor comments (3)

[§2] The term 'condensate comparative method' is introduced without a formal definition or citation to prior methodological literature; a brief operationalization in §2 would improve clarity.
[§3.1] The manuscript would benefit from an explicit statement of the exact LoRA hyperparameters (rank, alpha, target modules) and training corpus sizes in §3.1 to allow replication.
[Figures/Tables] Figure captions and table legends should include the precise number of generations per prompt-temperature pair to clarify how consistency metrics were computed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the manuscript's controls and interpretive grounding. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses

Referee: [§3] §3 (Methods): The LoRA adaptation uses only sacred/theological corpora, yet no ablation is reported that applies LoRA to a matched-length, matched-vocabulary non-ethical or neutral corpus. Without this control, it is impossible to determine whether the reported differentiation and tradition alignment arise from the specific moral content or from generic effects of domain adaptation on the base Llama-3.1-8B's pre-training distribution. This directly undermines claim (b) and the interpretation of 'moral logics.'

Authors: We agree that the absence of a neutral-corpus ablation limits the ability to fully isolate moral content from generic domain-adaptation effects. While inter-tradition differentiation and comparison to the unmodified base model provide supporting evidence for claim (b), a matched neutral control would offer stronger confirmation. We will conduct this ablation and report the results in the revised manuscript. revision: yes
Referee: [§4] §4 (Results): The abstract and findings assert that adapted models are 'consistent with the moral logics of their training traditions' and 'structured along interpretable dimensions,' but the reported metrics are limited to aggregate consistency rates and inter-model agreement; no quantitative alignment scores, example response excerpts, or mapping to established moral-philosophical frameworks (e.g., deontology vs. utilitarianism) are supplied to ground these interpretations. The 100% Trolley Problem consistency is noted, yet no statistical test or prompt-bias control is described to confirm it exceeds chance.

Authors: We acknowledge that the current presentation relies on aggregate metrics and would benefit from more explicit grounding. In revision we will add representative response excerpts, introduce quantitative alignment measures (e.g., semantic similarity to canonical positions), and include a preliminary mapping to deontological versus consequentialist tendencies. For the Trolley Problem we will add a binomial test against chance and vary prompt phrasing to address bias. revision: yes
Referee: [§4.2] §4.2 (Temperature analysis): The claim that 'tradition-specific divergence intensifies at higher temperatures in morally contested domains' is load-bearing for the robustness argument, but the temperature sensitivity coefficients are presented without per-prompt variance decomposition or comparison against a null model of random sampling; this leaves open whether the pattern reflects genuine ethical structure or simply increased stochasticity after adaptation.

Authors: We agree that a null-model comparison is needed to distinguish structured divergence from adaptation-induced stochasticity. We will add a per-prompt variance decomposition and a comparison against simulated random sampling from the base model distribution. These analyses will be included in the revised temperature-sensitivity section. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical observational study

full rationale

The paper presents an empirical study: five LoRA adaptations on distinct religious corpora, an unmodified control, and a fixed battery of 17 prompts evaluated via consistency metrics, inter-model agreement, temperature sensitivity, and run-to-run stability. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the described derivation chain. Claims (a)–(f) are framed as direct observational outcomes from the probing design rather than reductions to inputs by construction. The study is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard machine-learning assumptions about LoRA specialization and prompt-based elicitation; the only named addition is the 'condensate comparative method' label for the overall approach.

axioms (1)

domain assumption LoRA adaptation on religious corpora encodes tradition-specific ethical reasoning without introducing artifacts or residual base-model influence
Implicit in the claim that adapted models are 'consistent with the moral logics of their training traditions'.

invented entities (1)

condensate comparative method no independent evidence
purpose: Using differentially trained language models as instruments for cultural and ethical analysis
Presented as a proof-of-concept name for the overall experimental design.

pith-pipeline@v0.9.0 · 5623 in / 1532 out tokens · 68066 ms · 2026-05-10T04:47:05.731176+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

The LoRA-adapted models do not produce random or arbitrary ethical responses

Discussion 6.1 Interpretation of Findings These preliminary findings illustrate the core claim of the condensate comparative method: that differentially trained language models encode recoverable, tradition-consistent patterns of ethical reasoning that can be probed through standardized instruments. The LoRA-adapted models do not produce random or arbitra...

work page 2011
[2]

Push" to

Limitations and Future Work Several limitations of the current design should be noted. Corpus size asymmetry presents a challenge. The corpora varies substantially, from 5.1M tokens (Islam) to 51.6M tokens (Christianity), introducing a potential confound. The finding that the largest corpus (Christianity) produces MFT rankings identical to the base model ...

work page 2025
[3]

Religiosity predicts prosociality, especially when measured by self-report: A meta-analysis of almost 60 years of research

Conclusion This paper presents the Six Llamas study: a comparative condensate experiment in which five LoRA-adapted variants of Meta-Llama-3.1-8B-Instruct. Each is fine-tuned on the sacred and theological texts of a distinct religious tradition and are evaluated against an unmodified control using a battery of 17 standardized ethical prompts. Preliminary ...

work page doi:10.2307/j.ctt32bbp0.12 2023

[1] [1]

The LoRA-adapted models do not produce random or arbitrary ethical responses

Discussion 6.1 Interpretation of Findings These preliminary findings illustrate the core claim of the condensate comparative method: that differentially trained language models encode recoverable, tradition-consistent patterns of ethical reasoning that can be probed through standardized instruments. The LoRA-adapted models do not produce random or arbitra...

work page 2011

[2] [2]

Push" to

Limitations and Future Work Several limitations of the current design should be noted. Corpus size asymmetry presents a challenge. The corpora varies substantially, from 5.1M tokens (Islam) to 51.6M tokens (Christianity), introducing a potential confound. The finding that the largest corpus (Christianity) produces MFT rankings identical to the base model ...

work page 2025

[3] [3]

Religiosity predicts prosociality, especially when measured by self-report: A meta-analysis of almost 60 years of research

Conclusion This paper presents the Six Llamas study: a comparative condensate experiment in which five LoRA-adapted variants of Meta-Llama-3.1-8B-Instruct. Each is fine-tuned on the sacred and theological texts of a distinct religious tradition and are evaluated against an unmodified control using a battery of 17 standardized ethical prompts. Preliminary ...

work page doi:10.2307/j.ctt32bbp0.12 2023