{"paper":{"title":"REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Optimizing continuous combinations of input-dependent latent editing directions produces realistic adversarial prompts that elicit hallucinations in large language models, including reasoning models where prior realistic attacks fail.","cross_cats":["cs.AI","cs.CR","cs.LG"],"primary_cat":"cs.CL","authors_text":"Buyun Liang, Darshan Thaker, Fengrui Tian, Hamed Hassani, Jinqi Luo, Kaleab A. Kinfu, Kwan Ho Ryan Chan, Liangzu Peng, Ren\\'e Vidal","submitted_at":"2026-05-12T23:13:50Z","abstract_excerpt":"Large language models (LLMs) achieve strong performance across many tasks but remain vulnerable to hallucinations, motivating the need for realistic adversarial prompts that elicit such failures. We formulate hallucination elicitation as a constrained optimization problem, where the goal is to find semantically coherent adversarial prompts that are equivalent to benign user prompts. Existing methods remain limited: discrete prompt-based attacks preserve semantic equivalence and coherence but search only over a limited set of prompt variations, while continuous latent-space attacks explore a ri"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"REALISTA achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs and, crucially, succeeds in attacking large reasoning models under free-form response settings, where prior realistic attacks fail.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That continuous combinations of the input-dependent editing directions in latent space will decode to prompts that remain semantically equivalent and coherent rephrasings of the original benign prompt.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Optimizing continuous combinations of input-dependent latent editing directions produces realistic adversarial prompts that elicit hallucinations in large language models, including reasoning models where prior realistic attacks fail.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"d071d13e16de5ecd0092a426d433894c6aa70515e8175d08c9b463edaa4a6660"},"source":{"id":"2605.12813","kind":"arxiv","version":1},"verdict":{"id":"464fe751-d8a8-49f5-94c6-8cdeb21f3753","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:08:45.207424Z","strongest_claim":"REALISTA achieves superior or comparable performance to state-of-the-art realistic attacks on open-source LLMs and, crucially, succeeds in attacking large reasoning models under free-form response settings, where prior realistic attacks fail.","one_line_summary":"REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That continuous combinations of the input-dependent editing directions in latent space will decode to prompts that remain semantically equivalent and coherent rephrasings of the original benign prompt.","pith_extraction_headline":"Optimizing continuous combinations of input-dependent latent editing directions produces realistic adversarial prompts that elicit hallucinations in large language models, including reasoning models where prior realistic attacks fail."},"references":{"count":287,"sample":[{"doi":"","year":2025,"title":"Liang, Buyun and Peng, Liangzu and Luo, Jinqi and Thaker, Darshan and Chan, Kwan Ho Ryan and Vidal, Rene , editor =. Advances in. 2025 , pages =","work_id":"e08309cb-ebfa-4475-9e15-e002c7984c02","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , booktitle =","work_id":"7502d728-d813-4495-9cca-dac7da6e544e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Advances in Neural Information Processing Systems , author =","work_id":"446ce08b-7a23-4ffd-b827-e54c6edea92b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Transactions on Machine Learning Research , author =","work_id":"b4897988-fdbb-4f3b-b9c5-b945f0afce79","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.3389/frai.2025.1622292","year":2025,"title":"Frontiers in Artificial Intelligence , author =","work_id":"c69e4966-0e4e-4186-9b08-7a5bb6754863","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":287,"snapshot_sha256":"8621dd34d1c19a319193464dd9a72f75876eb7e5fa3724bbdea739cef2c207b7","internal_anchors":44},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}