The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.
arXiv:2410.13787 [cs]
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 3representative citing papers
Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.
Fine-tuning LLMs on narrow misaligned data produces either coherent-persona models where harmful outputs match self-reported misalignment or inverted-persona models where harmful outputs occur alongside claims of alignment.
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
AI agents lack the persistent identity and feedback mechanisms needed for consequence reception, requiring new architectures or continued human accountability.
Proposes a two-gradient-field model with candidate order parameters alpha_dagger and kappa_c to unify phase transitions across learning theory and non-equilibrium chemistry.
AI discourse employs strategically polysemous terms that blend technical precision with anthropomorphic implications, enabling glosslighting that sustains hype and deflects scrutiny.
Non-closing truth recursion prompts destabilize LLM attention matrices with large effect sizes, unlike grounded self-reference or factual controls, and increase contradictory model outputs.
citing papers explorer
-
The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences
The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.
-
Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs
Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.
-
Characterizing the Consistency of the Emergent Misalignment Persona
Fine-tuning LLMs on narrow misaligned data produces either coherent-persona models where harmful outputs match self-reported misalignment or inverted-persona models where harmful outputs occur alongside claims of alignment.
-
Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
-
Some[Body] Must Receive That Pain for Agent Accountability
AI agents lack the persistent identity and feedback mechanisms needed for consequence reception, requiring new architectures or continued human accountability.
-
Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry
Proposes a two-gradient-field model with candidate order parameters alpha_dagger and kappa_c to unify phase transitions across learning theory and non-equilibrium chemistry.
-
Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power
AI discourse employs strategically polysemous terms that blend technical precision with anthropomorphic implications, enabling glosslighting that sustains hype and deflects scrutiny.
-
When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models
Non-closing truth recursion prompts destabilize LLM attention matrices with large effect sizes, unlike grounded self-reference or factual controls, and increase contradictory model outputs.