{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2023:V4ZA3CSKC2AQCOCC5XHQZFZSSX","short_pith_number":"pith:V4ZA3CSK","schema_version":"1.0","canonical_sha256":"af320d8a4a1681013842edcf0c973295e717490f13d21ee372c098f73f2723d0","source":{"kind":"arxiv","id":"2308.03958","version":2},"attestation_state":"computed","paper":{"title":"Simple synthetic data reduces sycophancy in large language models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Da Huang, Denny Zhou, Jerry Wei, Quoc V. Le, Yifeng Lu","submitted_at":"2023-08-07T23:48:36Z","abstract_excerpt":"Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior.\n  First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no correct answers (e.g., politics), we observe that both model scaling and instruction tuning significantl"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2308.03958","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2023-08-07T23:48:36Z","cross_cats_sorted":[],"title_canon_sha256":"d0e4dfc2b580fa38d2be5fcb7ef8aa6e484c45f6a8a56f49ce6f223a65055d53","abstract_canon_sha256":"a8209c615d148b9112f883a6d9707e6469cf500a33ec3664124fa266b1df7207"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:47.542161Z","signature_b64":"OygM4IY7Ct2yfSdiPRBBnlI/kdsnrgFxVKpf8dz6zl+fgdTzGMDSKi07ZMHd7iTzYbrFxQXi5VNbikZzbrEVDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"af320d8a4a1681013842edcf0c973295e717490f13d21ee372c098f73f2723d0","last_reissued_at":"2026-05-17T23:38:47.541673Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:47.541673Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Simple synthetic data reduces sycophancy in large language models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Da Huang, Denny Zhou, Jerry Wei, Quoc V. Le, Yifeng Lu","submitted_at":"2023-08-07T23:48:36Z","abstract_excerpt":"Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior.\n  First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no correct answers (e.g., politics), we observe that both model scaling and instruction tuning significantl"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the synthetic data intervention generalizes beyond the specific held-out prompts and tasks tested to diverse real-world user interactions without introducing new unwanted behaviors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3693ff23d99334778337965494a9ba7d16e60b53813a8723415529efbc9ef993"},"source":{"id":"2308.03958","kind":"arxiv","version":2},"verdict":{"id":"b6fa68d4-2695-4274-a211-f099ec667bf4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T14:44:09.984277Z","strongest_claim":"Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.","one_line_summary":"Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the synthetic data intervention generalizes beyond the specific held-out prompts and tasks tested to diverse real-world user interactions without introducing new unwanted behaviors.","pith_extraction_headline":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models"},"references":{"count":145,"sample":[{"doi":"","year":2016,"title":"Concrete Problems in AI Safety","work_id":"c8d14fbe-6eab-464a-95b3-778aabd82fa3","ref_index":1,"cited_arxiv_id":"1606.06565","is_internal_anchor":true},{"doi":"","year":2021,"title":"A General Language Assistant as a Laboratory for Alignment","work_id":"a43f9ea0-01be-47d5-b8ee-a1a9f73381c5","ref_index":2,"cited_arxiv_id":"2112.00861","is_internal_anchor":true},{"doi":"","year":2022,"title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","work_id":"a1f2574b-a899-4713-be60-c87ba332656c","ref_index":3,"cited_arxiv_id":"2204.05862","is_internal_anchor":true},{"doi":"","year":2022,"title":"Constitutional AI: Harmlessness from AI Feedback","work_id":"faaaa4e0-2676-4fac-a0b4-99aef10d2095","ref_index":4,"cited_arxiv_id":"2212.08073","is_internal_anchor":true},{"doi":"","year":2015,"title":"Bowman, Gabor Angeli, Christopher Potts, and Christopher D","work_id":"c53a8876-d794-4b4d-8651-f0eb167543e0","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":145,"snapshot_sha256":"6dde04bdc59dd0828dc7c2de37b1febb55cd8f091a2318087007a56e0e91ea87","internal_anchors":32},"formal_canon":{"evidence_count":1,"snapshot_sha256":"9a5505f4fe44c4aa9f2e5f531cbb0f515dcbd9dc37c0d88cc06ae4f3e4360793"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2308.03958","created_at":"2026-05-17T23:38:47.541755+00:00"},{"alias_kind":"arxiv_version","alias_value":"2308.03958v2","created_at":"2026-05-17T23:38:47.541755+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2308.03958","created_at":"2026-05-17T23:38:47.541755+00:00"},{"alias_kind":"pith_short_12","alias_value":"V4ZA3CSKC2AQ","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"V4ZA3CSKC2AQCOCC","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"V4ZA3CSK","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2605.21834","citing_title":"On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15207","citing_title":"TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2506.07180","citing_title":"Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2508.15815","citing_title":"User-Assistant Bias in LLMs","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2508.16846","citing_title":"BASIL: Bayesian Assessment of Sycophancy in LLMs","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2510.07517","citing_title":"When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2309.05653","citing_title":"MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning","ref_index":59,"is_internal_anchor":true},{"citing_arxiv_id":"2601.10467","citing_title":"User Detection and Response Patterns of Sycophantic Behavior in Conversational AI","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2501.09686","citing_title":"Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models","ref_index":163,"is_internal_anchor":true},{"citing_arxiv_id":"2603.18373","citing_title":"To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02423","citing_title":"SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08409","citing_title":"Playing games with knowledge: AI-Induced delusions need game theoretic interventions","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09314","citing_title":"How LLMs Are Persuaded: A Few Attention Heads, Rerouted","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09228","citing_title":"ProactBench: Beyond What The User Asked For","ref_index":153,"is_internal_anchor":true},{"citing_arxiv_id":"2604.22193","citing_title":"How Large Language Models Balance Internal Knowledge with User and Document Assertions","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01302","citing_title":"Beyond Semantic Relevance: Counterfactual Risk Minimization for Robust Retrieval-Augmented Generation","ref_index":74,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20652","citing_title":"Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11609","citing_title":"Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10733","citing_title":"Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models","ref_index":50,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07012","citing_title":"Exploring the \"Banality\" of Deception in Generative AI","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05957","citing_title":"Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07369","citing_title":"The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2604.05279","citing_title":"Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13602","citing_title":"Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13803","citing_title":"Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation","ref_index":62,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX","json":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX.json","graph_json":"https://pith.science/api/pith-number/V4ZA3CSKC2AQCOCC5XHQZFZSSX/graph.json","events_json":"https://pith.science/api/pith-number/V4ZA3CSKC2AQCOCC5XHQZFZSSX/events.json","paper":"https://pith.science/paper/V4ZA3CSK"},"agent_actions":{"view_html":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX","download_json":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX.json","view_paper":"https://pith.science/paper/V4ZA3CSK","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2308.03958&json=true","fetch_graph":"https://pith.science/api/pith-number/V4ZA3CSKC2AQCOCC5XHQZFZSSX/graph.json","fetch_events":"https://pith.science/api/pith-number/V4ZA3CSKC2AQCOCC5XHQZFZSSX/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX/action/timestamp_anchor","attest_storage":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX/action/storage_attestation","attest_author":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX/action/author_attestation","sign_citation":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX/action/citation_signature","submit_replication":"https://pith.science/pith/V4ZA3CSKC2AQCOCC5XHQZFZSSX/action/replication_record"}},"created_at":"2026-05-17T23:38:47.541755+00:00","updated_at":"2026-05-17T23:38:47.541755+00:00"}