{"paper":{"title":"Simple synthetic data reduces sycophancy in large language models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Da Huang, Denny Zhou, Jerry Wei, Quoc V. Le, Yifeng Lu","submitted_at":"2023-08-07T23:48:36Z","abstract_excerpt":"Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study the prevalence of sycophancy in language models and propose a simple synthetic-data intervention to reduce this behavior.\n  First, on a set of three sycophancy tasks (Perez et al., 2022) where models are asked for an opinion on statements with no correct answers (e.g., politics), we observe that both model scaling and instruction tuning significantl"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the synthetic data intervention generalizes beyond the specific held-out prompts and tasks tested to diverse real-world user interactions without introducing new unwanted behaviors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3693ff23d99334778337965494a9ba7d16e60b53813a8723415529efbc9ef993"},"source":{"id":"2308.03958","kind":"arxiv","version":2},"verdict":{"id":"b6fa68d4-2695-4274-a211-f099ec667bf4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T14:44:09.984277Z","strongest_claim":"Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.","one_line_summary":"Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the synthetic data intervention generalizes beyond the specific held-out prompts and tasks tested to diverse real-world user interactions without introducing new unwanted behaviors.","pith_extraction_headline":"Lightweight finetuning with synthetic data from public NLP tasks reduces sycophancy in large language models"},"references":{"count":145,"sample":[{"doi":"","year":2016,"title":"Concrete Problems in AI Safety","work_id":"c8d14fbe-6eab-464a-95b3-778aabd82fa3","ref_index":1,"cited_arxiv_id":"1606.06565","is_internal_anchor":true},{"doi":"","year":2021,"title":"A General Language Assistant as a Laboratory for Alignment","work_id":"a43f9ea0-01be-47d5-b8ee-a1a9f73381c5","ref_index":2,"cited_arxiv_id":"2112.00861","is_internal_anchor":true},{"doi":"","year":2022,"title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","work_id":"a1f2574b-a899-4713-be60-c87ba332656c","ref_index":3,"cited_arxiv_id":"2204.05862","is_internal_anchor":true},{"doi":"","year":2022,"title":"Constitutional AI: Harmlessness from AI Feedback","work_id":"faaaa4e0-2676-4fac-a0b4-99aef10d2095","ref_index":4,"cited_arxiv_id":"2212.08073","is_internal_anchor":true},{"doi":"","year":2015,"title":"Bowman, Gabor Angeli, Christopher Potts, and Christopher D","work_id":"c53a8876-d794-4b4d-8651-f0eb167543e0","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":145,"snapshot_sha256":"6dde04bdc59dd0828dc7c2de37b1febb55cd8f091a2318087007a56e0e91ea87","internal_anchors":32},"formal_canon":{"evidence_count":1,"snapshot_sha256":"9a5505f4fe44c4aa9f2e5f531cbb0f515dcbd9dc37c0d88cc06ae4f3e4360793"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}