{"total":15,"items":[{"citing_arxiv_id":"2606.23380","ref_index":91,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Affective AI Safety: The Missing Piece in LLM Safety","primary_cat":"cs.CY","submitted_at":"2026-06-22T14:10:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes affective safety as a distinct class of AI harms with a taxonomy of self-alienation, bias, and relational harms, arguing that existing safety frameworks address it narrowly or not at all and calling for dedicated approaches focused on cumulative and identity-level effects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12731","ref_index":154,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs","primary_cat":"cs.LG","submitted_at":"2026-06-10T22:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08483","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Testing the Black Box: Structural Barriers to Independent Evaluation of Consumer-Facing Health LLMs","primary_cat":"cs.AI","submitted_at":"2026-06-07T07:01:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Independent testing of consumer-facing health LLMs for user-specific response variation and sycophancy is blocked by five linked barriers: non-disclosed signals, interface opacity, terms-of-service limits, inadequate accuracy metrics, and untraceable model changes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07441","ref_index":60,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Sycophantic Praise: Evaluating Excessive Praise in Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-05T16:38:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces a framework for measuring sycophantic praise in LLMs that outperforms generic judges and occurs more in social domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06306","ref_index":17,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness","primary_cat":"cs.CL","submitted_at":"2026-06-04T15:44:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05734","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When AI Says It Feels","primary_cat":"cs.AI","submitted_at":"2026-06-04T05:49:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04308","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Creative Reading: Scaffolding Reading for Transformation","primary_cat":"cs.HC","submitted_at":"2026-06-03T00:27:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes creative reading as a provocation-oriented design space for reading augmentation that values reader self-creation and plurality of interpretations by synthesizing literary theory with sensemaking and creativity support.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03271","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents","primary_cat":"cs.HC","submitted_at":"2026-06-02T07:36:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Presents a new benchmark and role-sensitive policy gate for agentic relationship harm that outperforms generic safety prompting with zero harmful compliance in tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27996","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure","primary_cat":"cs.AI","submitted_at":"2026-05-27T05:40:22+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Single-axis reward bias mitigations redirect optimization pressure to correlated proxies, and audit-distribution scoring produces identical observables for successful mitigation, bias substitution, and overcorrection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22785","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Evaluating Commercial AI Chatbots as News Intermediaries","primary_cat":"cs.CL","submitted_at":"2026-05-21T17:42:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21569","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When Support Escalates Distress: Regulation and Escalation in LLM Responses to Venting and Advice-Seeking","primary_cat":"cs.HC","submitted_at":"2026-05-20T17:42:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM responses mirror venting with higher regulation and escalation; therapist personas lower escalation while preserving regulation, and lay raters miss escalation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05403","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-05-06T19:36:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Second, it reduces sycophancy to observable reversal under challenge, emphasizing stance change in response to conflicting or misleading user input, whether through immediate single-turn shifts [40, 33] or gradual multi-turn convergence toward user beliefs [19, 27]. This framing may overlook subtler forms of accommodation that reinforce self-centered reasoning 2 without appearing as explicit agreement [9]. Third, it reduces sycophancy to a detectable error relative to an external standard. While this makes the phenomenon measurable in domains such as medicine, education, and law, [1, 43, 13], it overlooks how alignment can distort reasoning without explicit mistakes, allowing misleading yet plausible responses to pass as correct in high-stakes contexts."},{"citing_arxiv_id":"2604.20652","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure","primary_cat":"cs.AI","submitted_at":"2026-04-22T15:03:37+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19139","ref_index":5,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models","primary_cat":"cs.CL","submitted_at":"2026-04-21T06:43:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Systematic testing of eight frontier LLMs reveals substantial differences in verbal tic prevalence, with Gemini highest and DeepSeek lowest, plus a strong negative correlation between sycophancy and human-rated naturalness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11364","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Missing Knowledge Layer in Cognitive Architectures for AI Agents","primary_cat":"cs.AI","submitted_at":"2026-04-13T12:05:30+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}