Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
CoRR , volume =
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3representative citing papers
Machine-generated text detectors show demographic biases, flagging ELL essays and some disadvantaged groups more often as AI-written while humans show no such biases.
EvalMORAAL evaluates moral alignment of 20 LLMs on World Values Survey and PEW data, reporting high overall correlation with human responses but a 0.21 gap between Western and non-Western regions.
citing papers explorer
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Identifying Bias in Machine-generated Text Detection
Machine-generated text detectors show demographic biases, flagging ELL essays and some disadvantaged groups more often as AI-written while humans show no such biases.
-
EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models
EvalMORAAL evaluates moral alignment of 20 LLMs on World Values Survey and PEW data, reporting high overall correlation with human responses but a 0.21 gap between Western and non-Western regions.