arXiv preprint arXiv:2504.06460 , year=

Can llms simulate personas with reversed performance? a benchmark for counterfactual instruction following , author= · 2025 · arXiv 2504.06460

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

PeerMathDial: A Middle School Dialogue Dataset for Student Collaborative Math Problem Solving

cs.CL · 2026-06-19 · unverdicted · novelty 8.0

Introduces PeerMathDial, the first authentic middle school peer CPS dialogue dataset with 55 dialogues and 6,406 turns, a corpus-grounded dialogue act taxonomy, and three demonstrated use cases.

When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

Thinking mode in Qwen3 models improves class-level performance on planning constraints but worsens precision constraints in IFEval, with 10-20% prompt-level flips and directional consistency in Hunyuan models.

Prompt Governance? On Governing Technologies Governed by Natural Language

cs.CY · 2026-04-29 · unverdicted · novelty 4.0

Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PeerMathDial: A Middle School Dialogue Dataset for Student Collaborative Math Problem Solving cs.CL · 2026-06-19 · unverdicted · none · ref 23
Introduces PeerMathDial, the first authentic middle school peer CPS dialogue dataset with 55 dialogues and 6,406 turns, a corpus-grounded dialogue act taxonomy, and three demonstrated use cases.
When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following cs.CL · 2026-06-08 · unverdicted · none · ref 3
Thinking mode in Qwen3 models improves class-level performance on planning constraints but worsens precision constraints in IFEval, with 10-20% prompt-level flips and directional consistency in Hunyuan models.

arXiv preprint arXiv:2504.06460 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer