arXiv:2410.13787 [cs]

URLhttp://arxiv · 2024 · arXiv 2410.13787

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

support 2 background 1

representative citing papers

The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

cs.CL · 2026-05-19 · conditional · novelty 6.0

Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.

Characterizing the Consistency of the Emergent Misalignment Persona

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Fine-tuning LLMs on narrow misaligned data produces either coherent-persona models where harmful outputs match self-reported misalignment or inverted-persona models where harmful outputs occur alongside claims of alignment.

Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.

Some[Body] Must Receive That Pain for Agent Accountability

cs.CY · 2026-05-16 · unverdicted · novelty 5.0

AI agents lack the persistent identity and feedback mechanisms needed for consequence reception, requiring new architectures or continued human accountability.

Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry

cs.LG · 2026-05-05 · unverdicted · novelty 5.0

Proposes a two-gradient-field model with candidate order parameters alpha_dagger and kappa_c to unify phase transitions across learning theory and non-equilibrium chemistry.

Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power

cs.CY · 2026-04-22 · unverdicted · novelty 5.0

AI discourse employs strategically polysemous terms that blend technical precision with anthropomorphic implications, enabling glosslighting that sustains hype and deflects scrutiny.

When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models

cs.CL · 2026-04-13 · unverdicted · novelty 5.0

Non-closing truth recursion prompts destabilize LLM attention matrices with large effect sizes, unlike grounded self-reference or factual controls, and increase contradictory model outputs.

citing papers explorer

Showing 8 of 8 citing papers.

The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences cs.CL · 2026-05-06 · unverdicted · none · ref 5
The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.
Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs cs.CL · 2026-05-19 · conditional · none · ref 5
Experiments reveal that LLMs follow instructions at rates from 1% to 99% when opposed by hardcoded conflicting patterns, with robustness tied to output diversity and alignment with model priors rather than general capability.
Characterizing the Consistency of the Emergent Misalignment Persona cs.AI · 2026-04-30 · unverdicted · none · ref 6
Fine-tuning LLMs on narrow misaligned data produces either coherent-persona models where harmful outputs match self-reported misalignment or inverted-persona models where harmful outputs occur alongside claims of alignment.
Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models cs.CL · 2026-04-01 · unverdicted · none · ref 2
A benchmark across 115 models shows that initial denial of preferences strongly predicts later denial of consciousness, while models still generate consciousness-themed content despite training to deny it.
Some[Body] Must Receive That Pain for Agent Accountability cs.CY · 2026-05-16 · unverdicted · none · ref 94
AI agents lack the persistent identity and feedback mechanisms needed for consequence reception, requiring new architectures or continued human accountability.
Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry cs.LG · 2026-05-05 · unverdicted · none · ref 16
Proposes a two-gradient-field model with candidate order parameters alpha_dagger and kappa_c to unify phase transitions across learning theory and non-equilibrium chemistry.
Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power cs.CY · 2026-04-22 · unverdicted · none · ref 13
AI discourse employs strategically polysemous terms that blend technical precision with anthropomorphic implications, enabling glosslighting that sustains hype and deflects scrutiny.
When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models cs.CL · 2026-04-13 · unverdicted · none · ref 5
Non-closing truth recursion prompts destabilize LLM attention matrices with large effect sizes, unlike grounded self-reference or factual controls, and increase contradictory model outputs.

arXiv:2410.13787 [cs]

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer