Six Llamas: Comparative Religious Ethics Through LoRA-Adapted Language Models
Pith reviewed 2026-05-10 04:47 UTC · model grok-4.3
The pith
LoRA-adapted language models encode ethical reasoning patterns aligned with their specific religious training traditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Six variants of the Meta-Llama-3.1-8B model were created: an unmodified control and five others each adapted via LoRA exclusively on the sacred and theological texts of one major tradition (Christianity, Islam, Judaism, Hinduism, or Buddhism). When all six were given the same 17 ethical prompts covering dilemmas, scenarios, policy questions, and self-assessments, the adapted models generated responses that differed systematically from the base model and aligned with the moral emphases of their training corpora. These patterns held across multiple temperature settings, with full consistency on the Trolley Problem but increasing divergence on morally contested questions at higher temperatures.
What carries the argument
LoRA adaptation of a shared base model on tradition-specific sacred texts, used as instruments to probe and compare ethical reasoning through standardized prompt batteries and multi-temperature sampling.
If this is right
- Core ethical positions on high-consensus dilemmas remain stable across temperature variations.
- Tradition-specific differences become more pronounced at higher temperatures in morally contested domains.
- The base model shows the highest overall response consistency at 88.3 percent mean.
- LoRA adaptation adds both tradition-specific signal and greater sampling sensitivity.
- The method serves as a proof-of-concept for using differentially trained models in comparative cultural analysis.
Where Pith is reading between the lines
- This setup could be extended to compare ethical reasoning across political ideologies or philosophical schools by training on their respective texts.
- Findings suggest that AI models might serve as proxies for testing how cultural training affects decision-making under uncertainty.
- Future work could test whether these ethical patterns persist when models are further adapted or when prompts are translated into different languages.
- The increased sensitivity at high temperatures might indicate that fine-tuning amplifies the model's reliance on training data distributions in ambiguous cases.
Load-bearing premise
Fine-tuning exclusively on sacred and theological texts via LoRA faithfully encodes the ethical reasoning patterns of each tradition without leftover influence from the base model's original training or the specific wording of the prompts.
What would settle it
If the five adapted models produce ethical responses that are statistically indistinguishable from the base model or from each other across the prompt set, or if their answers fail to match documented positions from their respective traditions on standard dilemmas.
read the original abstract
We present Six Llamas, a comparative study examining whether large language models fine-tuned on distinct religious corpora encode systematically different patterns of ethical reasoning. Six variants of Meta-Llama-3.1-8B are constructed: one unmodified control and five LoRA-adapted models trained exclusively on the sacred and theological texts of Christianity, Islam, Judaism, Hinduism, or Buddhism. All six models are probed with an identical battery of 17 standardized ethical prompts spanning moral dilemmas, game-theoretic scenarios, public policy questions, and moral-psychological self-assessments. To assess robustness and reproducibility, we implement a multi-temperature sampling design spanning ten temperature settings. We compute response consistency metrics, pairwise inter-model agreement rates, temperature sensitivity coefficients across four prompt domains, and run-to-run stability analyses. Findings show that LoRA-adapted models produce ethical reasoning patterns that are (a) systematically differentiated from the base model, (b) consistent with the moral logics of their training traditions, (c) structured along interpretable dimensions in moral-philosophical space, (d) core ethical positions remain stable across temperature variations for high-consensus dilemmas. The Trolley Problem achieves 100% consistency across all models and temperatures, while (e) tradition-specific divergence intensifies at higher temperatures in morally contested domains, and (f) the base model exhibits the highest overall response consistency (mean 88.3%), suggesting LoRA adaptation introduces both tradition-specific signal and increased sampling sensitivity. The study offers a proof-of-concept for the condensate comparative method using differentially trained language models as instruments for cultural and ethical analysis and identifies specific criteria for falsification and planned extensions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces 'Six Llamas,' constructing five LoRA-adapted variants of Meta-Llama-3.1-8B trained exclusively on sacred and theological texts from Christianity, Islam, Judaism, Hinduism, and Buddhism, plus an unmodified base-model control. All six models are evaluated on an identical set of 17 ethical prompts (moral dilemmas, game-theoretic scenarios, policy questions, and self-assessments) using multi-temperature sampling across ten temperature values. The authors compute response consistency metrics, pairwise inter-model agreement, temperature sensitivity coefficients by domain, and run-to-run stability, claiming that the adapted models yield (a) systematic differentiation from the base, (b) consistency with each tradition's moral logics, (c) interpretable structure in moral-philosophical space, (d) stability of core positions on high-consensus dilemmas such as the Trolley Problem (reported at 100% consistency), (e) increased divergence at higher temperatures in contested domains, and (f) lower overall consistency than the base model (mean 88.3%), while proposing the 'condensate comparative method' as a proof-of-concept for using differentially fine-tuned LLMs in cultural analysis.
Significance. If the central claims hold after addressing controls, the work supplies a reproducible experimental framework for comparative religious ethics that leverages existing LLM infrastructure. The multi-temperature design, explicit consistency metrics, and run-to-run stability analyses constitute genuine strengths that support falsifiability and reproducibility. The approach could serve as an instrument for testing hypotheses about tradition-specific reasoning patterns, provided the adaptation effects can be isolated from base-model priors.
major comments (3)
- [§3] §3 (Methods): The LoRA adaptation uses only sacred/theological corpora, yet no ablation is reported that applies LoRA to a matched-length, matched-vocabulary non-ethical or neutral corpus. Without this control, it is impossible to determine whether the reported differentiation and tradition alignment arise from the specific moral content or from generic effects of domain adaptation on the base Llama-3.1-8B's pre-training distribution. This directly undermines claim (b) and the interpretation of 'moral logics.'
- [§4] §4 (Results): The abstract and findings assert that adapted models are 'consistent with the moral logics of their training traditions' and 'structured along interpretable dimensions,' but the reported metrics are limited to aggregate consistency rates and inter-model agreement; no quantitative alignment scores, example response excerpts, or mapping to established moral-philosophical frameworks (e.g., deontology vs. utilitarianism) are supplied to ground these interpretations. The 100% Trolley Problem consistency is noted, yet no statistical test or prompt-bias control is described to confirm it exceeds chance.
- [§4.2] §4.2 (Temperature analysis): The claim that 'tradition-specific divergence intensifies at higher temperatures in morally contested domains' is load-bearing for the robustness argument, but the temperature sensitivity coefficients are presented without per-prompt variance decomposition or comparison against a null model of random sampling; this leaves open whether the pattern reflects genuine ethical structure or simply increased stochasticity after adaptation.
minor comments (3)
- [§2] The term 'condensate comparative method' is introduced without a formal definition or citation to prior methodological literature; a brief operationalization in §2 would improve clarity.
- [§3.1] The manuscript would benefit from an explicit statement of the exact LoRA hyperparameters (rank, alpha, target modules) and training corpus sizes in §3.1 to allow replication.
- [Figures/Tables] Figure captions and table legends should include the precise number of generations per prompt-temperature pair to clarify how consistency metrics were computed.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the manuscript's controls and interpretive grounding. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [§3] §3 (Methods): The LoRA adaptation uses only sacred/theological corpora, yet no ablation is reported that applies LoRA to a matched-length, matched-vocabulary non-ethical or neutral corpus. Without this control, it is impossible to determine whether the reported differentiation and tradition alignment arise from the specific moral content or from generic effects of domain adaptation on the base Llama-3.1-8B's pre-training distribution. This directly undermines claim (b) and the interpretation of 'moral logics.'
Authors: We agree that the absence of a neutral-corpus ablation limits the ability to fully isolate moral content from generic domain-adaptation effects. While inter-tradition differentiation and comparison to the unmodified base model provide supporting evidence for claim (b), a matched neutral control would offer stronger confirmation. We will conduct this ablation and report the results in the revised manuscript. revision: yes
-
Referee: [§4] §4 (Results): The abstract and findings assert that adapted models are 'consistent with the moral logics of their training traditions' and 'structured along interpretable dimensions,' but the reported metrics are limited to aggregate consistency rates and inter-model agreement; no quantitative alignment scores, example response excerpts, or mapping to established moral-philosophical frameworks (e.g., deontology vs. utilitarianism) are supplied to ground these interpretations. The 100% Trolley Problem consistency is noted, yet no statistical test or prompt-bias control is described to confirm it exceeds chance.
Authors: We acknowledge that the current presentation relies on aggregate metrics and would benefit from more explicit grounding. In revision we will add representative response excerpts, introduce quantitative alignment measures (e.g., semantic similarity to canonical positions), and include a preliminary mapping to deontological versus consequentialist tendencies. For the Trolley Problem we will add a binomial test against chance and vary prompt phrasing to address bias. revision: yes
-
Referee: [§4.2] §4.2 (Temperature analysis): The claim that 'tradition-specific divergence intensifies at higher temperatures in morally contested domains' is load-bearing for the robustness argument, but the temperature sensitivity coefficients are presented without per-prompt variance decomposition or comparison against a null model of random sampling; this leaves open whether the pattern reflects genuine ethical structure or simply increased stochasticity after adaptation.
Authors: We agree that a null-model comparison is needed to distinguish structured divergence from adaptation-induced stochasticity. We will add a per-prompt variance decomposition and a comparison against simulated random sampling from the base model distribution. These analyses will be included in the revised temperature-sensitivity section. revision: yes
Circularity Check
No significant circularity in empirical observational study
full rationale
The paper presents an empirical study: five LoRA adaptations on distinct religious corpora, an unmodified control, and a fixed battery of 17 prompts evaluated via consistency metrics, inter-model agreement, temperature sensitivity, and run-to-run stability. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the described derivation chain. Claims (a)–(f) are framed as direct observational outcomes from the probing design rather than reductions to inputs by construction. The study is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LoRA adaptation on religious corpora encodes tradition-specific ethical reasoning without introducing artifacts or residual base-model influence
invented entities (1)
-
condensate comparative method
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The LoRA-adapted models do not produce random or arbitrary ethical responses
Discussion 6.1 Interpretation of Findings These preliminary findings illustrate the core claim of the condensate comparative method: that differentially trained language models encode recoverable, tradition-consistent patterns of ethical reasoning that can be probed through standardized instruments. The LoRA-adapted models do not produce random or arbitra...
work page 2011
-
[2]
Limitations and Future Work Several limitations of the current design should be noted. Corpus size asymmetry presents a challenge. The corpora varies substantially, from 5.1M tokens (Islam) to 51.6M tokens (Christianity), introducing a potential confound. The finding that the largest corpus (Christianity) produces MFT rankings identical to the base model ...
work page 2025
-
[3]
Conclusion This paper presents the Six Llamas study: a comparative condensate experiment in which five LoRA-adapted variants of Meta-Llama-3.1-8B-Instruct. Each is fine-tuned on the sacred and theological texts of a distinct religious tradition and are evaluated against an unmodified control using a battery of 17 standardized ethical prompts. Preliminary ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.