{"total":15,"items":[{"citing_arxiv_id":"2607.01951","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robust for the Wrong Reasons: The Representational Geometry of LLM Robustness to Science Skepticism","primary_cat":"physics.soc-ph","submitted_at":"2026-07-02T09:40:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs show three distinct non-sycophantic responses to science skepticism, with robustness in some cases being accidental because the model does not represent the skepticism signal, as determined by linear probes on three models in three domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.01071","ref_index":7,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemSyco-Bench: Benchmarking Sycophancy in Agent Memory","primary_cat":"cs.IR","submitted_at":"2026-07-01T15:30:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemSyco-Bench is a benchmark covering five tasks to evaluate memory-induced sycophancy in LLM agents, testing rejection of invalid memory, scope respect, conflict resolution, update tracking, and valid personalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18060","ref_index":143,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PseudoBench: Measuring How Agentic Auto-Research Fuels Pseudoscience","primary_cat":"cs.AI","submitted_at":"2026-06-16T15:37:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PseudoBench shows current LLM agents produce persuasive pseudoscientific reports with near-zero refusal rates and at most 27.4% resistance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.13220","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis","primary_cat":"cs.AI","submitted_at":"2026-06-11T11:37:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM-as-an-Investigator improves diagnostic accuracy over direct prompting by using an evidence-first protocol of hypothesis generation, clarification questions, and iterative probability updates in technical problem solving.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10949","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models","primary_cat":"cs.AI","submitted_at":"2026-06-09T14:53:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10158","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\"Where is this coming from?\" Uncovering Trustworthiness Ideals in AI-powered Peripartum Information Seeking","primary_cat":"cs.CY","submitted_at":"2026-06-08T20:36:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Qualitative focus-group study finds that trustworthiness in AI for peripartum information must be inspectable rather than asserted, yielding four governance themes: social sensemaking support, pluralistic verification, inspectable recourse, and ecosystem-aware integration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06306","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness","primary_cat":"cs.CL","submitted_at":"2026-06-04T15:44:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02444","ref_index":82,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback","primary_cat":"cs.AI","submitted_at":"2026-06-01T16:14:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06476","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts","primary_cat":"cs.CL","submitted_at":"2026-05-07T16:01:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05957","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-07T10:04:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. Discovering language model behaviors with model-written evaluations. InFindings of the Association for Computational Linguistics: ACL 2023, pages 13387-13434, 2023. [10] Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, and Quoc V Le. Simple synthetic data reduces sycophancy in large language models.arXiv preprint arXiv:2308.03958, 2023. [11] Leonardo Ranaldi and Giulia Pucci. When large language models contradict humans? large language models' sycophantic behaviour.arXiv preprint arXiv:2311.09410, 2023. [12] Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, et al. FreshLLMs: Refreshing large language models with search engine augmen-"},{"citing_arxiv_id":"2605.05403","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-05-06T19:36:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"Bowman, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna M. Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez. Towards understanding sycophancy in language models. In The Twelfth International Conference on Learning Representations, Vienna, Austria, May 2024. URLhttps://openreview.net/forum?id=tvhaxkMKAn. [41] Jerick Shi, Terry Jingcheng Zhang, Zhijing Jin, and Vincent Conitzer. From hallucination to scheming: A unified taxonomy and benchmark analysis for llm deception.arXiv preprint arXiv:2604.04788, April 2026. URL https://arxiv.org/abs/2604.04788. Accepted to the ICLR 2026 Workshop on Agents in the Wild: Safety, Security, and Beyond. 13 [42] Heung-Yeung Shum, Xiaodong He, and Di Li."},{"citing_arxiv_id":"2604.20652","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure","primary_cat":"cs.AI","submitted_at":"2026-04-22T15:03:37+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13803","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation","primary_cat":"cs.CV","submitted_at":"2026-04-15T12:38:51+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05279","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition","primary_cat":"cs.AI","submitted_at":"2026-04-07T00:28:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.10467","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"User Detection and Response Patterns of Sycophantic Behavior in Conversational AI","primary_cat":"cs.HC","submitted_at":"2026-01-15T14:51:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}