Questioning the Survey Responses of Large Language Models , url=

Dominguez-Olmedo, Ricardo, Moritz Hardt · 2023 · arXiv 2306.07951

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

LLMs implicitly plan answer positions during MCQ generation, as shown by predictive signals in hidden representations and controllable shifts via activation steering.

Graph-Based Alternatives to LLMs for Human Simulation

cs.CL · 2025-11-03 · conditional · novelty 6.0

GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

cs.CL · 2023-05-16 · unverdicted · novelty 6.0

LLM embeddings enable strong retrodiction of masked GSS opinions via cross-validation and external validation but only modest performance on entirely unasked opinions.

Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

cs.LG · 2025-07-30 · unverdicted · novelty 4.0

Human tests should not be applied to AI to measure traits like intelligence due to calibration, validity, contamination, and prompt sensitivity issues; develop AI-specific evaluation frameworks instead.

citing papers explorer

Showing 4 of 4 citing papers.

Do Large Language Models Plan Answer Positions? Position Bias in Multiple-Choice Question Generation cs.CL · 2026-05-03 · unverdicted · none · ref 15
LLMs implicitly plan answer positions during MCQ generation, as shown by predictive signals in hidden representations and controllable shifts via activation steering.
Graph-Based Alternatives to LLMs for Human Simulation cs.CL · 2025-11-03 · conditional · none · ref 17
GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction cs.CL · 2023-05-16 · unverdicted · none · ref 34
LLM embeddings enable strong retrodiction of masked GSS opinions via cross-validation and external validation but only modest performance on entirely unasked opinions.
Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead cs.LG · 2025-07-30 · unverdicted · none · ref 20
Human tests should not be applied to AI to measure traits like intelligence due to calibration, validity, contamination, and prompt sensitivity issues; develop AI-specific evaluation frameworks instead.

Questioning the Survey Responses of Large Language Models , url=

fields

years

verdicts

representative citing papers

citing papers explorer